Commit Graph

40594 Commits

Author SHA1 Message Date
Johannes Berg 7d37fcd409 mac80211: reject zero cookie in mgmt-tx/roc cancel
When cancelling, you can cancel "any" (first in list) mgmt-tx
or remain-on-channel operation by using the value 0 for the
cookie along with the *opposite* operation, i.e.
 * cancel the first mgmt-tx by cancelling roc with 0 cookie
 * cancel the first roc by cancelling mgmt-tx with 0 cookie

This isn't really that bad since userspace should only pass
cookies that we gave it, but could lead to hard-to-debug
issues so better prevent it and reject zero values since we
never hand those out.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Jouni Malinen c39b336deb mac80211: Allow a STA to join an IBSS with 80+80 MHz channel
While it was possible to create an IBSS with 80+80 MHz channel, joining
such an IBSS resulted in falling back to 20 MHz channel with VHT
disabled due to a missing switch case for 80+80.

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Michal Sojka 1aeb135f84 cfg80211: reg: Refactor calculation of bandwidth flags
The same piece of code appears at two places. Make a function from it.

Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg aaa016ccd5 mac80211: rewrite remain-on-channel logic
Jouni found a bug in the remain-on-channel logic: when a short item
is queued, a long item is combined with it extending the original
one, and then the long item is deleted, the timeout doesn't go back
to the short one, and the short item ends up taking a long time. In
this case, this showed as blocking scan when running two test cases
back to back - the scan from the second was delayed even though all
the remain-on-channel items should long have been gone.

Fixing this with the current data structures turns out to be a bit
complicated, we just remove the long item from the dependents list
right now and don't recalculate the timeouts.

There's a somewhat similar bug where we delete the short item and
all the dependents go with it; to fix this we'd have to move them
from the dependents to the real list.

Instead of trying to do that, rewrite the code to not have all this
complexity in the data structures: use a single list and allow more
than one entry in it being marked as started. This makes the code a
bit more complex, the worker needs to understand that it might need
to just remove one of the started items, while keeping the device
off-channel, but that's not more complicated than the nested data
structures.

This then fixes both issues described, and makes it easier to also
limit the overall off-channel time when combining.

TODO: as before, with hardware remain-on-channel, deleting an item
after combining results in cancelling them all - we can keep track
of the time elapsed and only cancel after that to fix this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg 5ee00dbd52 mac80211: simplify ack_skb handling
Since the cookie is assigned inside ieee80211_make_ack_skb()
now, we no longer need to return the ack_skb as the cookie
and can simplify the function's return and the callers. Also
rename it to ieee80211_attach_ack_skb() to more accurately
reflect its purpose.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg a2fcfccbad mac80211: move off-channel/mgmt-tx code to offchannel.c
This is quite a bit of code that logically depends here since
it has to deal with all the remain-on-channel logic.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg e673a65952 mac80211: fix mgmt-tx abort cookie and leak
If a mgmt-tx operation is aborted before it runs, the wrong
cookie is reported back to userspace, and the ack_skb gets
leaked since the frame is freed directly instead of freeing
it using ieee80211_free_txskb(). Fix that.

Fixes: 3b79af973c ("mac80211: stop using pointers as userspace cookies")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg 856142cdaa mac80211: catch queue stop underflow
If some code stops the queues more times than having started
(for when refcounting is used), warn on and reset the counter
to 0 to avoid blocking forever.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg 63b4d8b373 mac80211: properly free TX skbs when monitor TX fails
We need to free all skbs here, not just the one we peeked
from the list.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg 86c7ec9eb1 mac80211: properly free skb when r-o-c for TX fails
When freeing the TX skb for an off-channel TX, use the correct
API to also free the ACK skb that might have been allocated.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg 90f9ba9b89 Revert "mac80211: don't advertise NL80211_FEATURE_FULL_AP_CLIENT_STATE"
This reverts commit 45bb780a21,
the previous two patches fixed the functionality.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg bda95eb1d1 cfg80211: handle add_station auth/assoc flag quirks
When a new station is added to AP/GO interfaces the default behaviour
is for it to be added authenticated and associated, due to backwards
compatibility. To prevent that, the driver must be able to do that
(setting the NL80211_FEATURE_FULL_AP_CLIENT_STATE feature flag) and
userspace must set the flag mask to auth|assoc and clear the set.

Handle this quirk in the API entirely in nl80211, and always push the
full flags to the drivers. NL80211_FEATURE_FULL_AP_CLIENT_STATE is
still required for userspace to be allowed to set the mask including
those bits, but after checking that add both flags to the mask and
set in case userspace didn't set them otherwise.

This obsoletes the mac80211 code handling this difference, no other
driver is currently using these flags.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Ayala Beker a9bc31e418 cfg80211: use NL80211_ATTR_STA_AID in nl82011_set_station
Fix nl80211_set_station() to use the value of NL80211_ATTR_STA_AID
attribute instead of NL80211_ATTR_PEER_AID attribute.

Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Vidyullatha Kanchanapally 91f123f20d mac80211: Add support for aborting an ongoing scan
This commit adds implementation for abort scan in mac80211.

Reviewed-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Vidyullatha Kanchanapally <vkanchan@qti.qualcomm.com>
Signed-off-by: Sunil Dutt <usdutt@qti.qualcomm.com>
[adjust to wdev change in previous patch and clean up code a bit]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Vidyullatha Kanchanapally 91d3ab4673 cfg80211: Add support for aborting an ongoing scan
Implement new functionality for aborting an ongoing scan.

Add NL80211_CMD_ABORT_SCAN to the nl80211 interface. After
aborting the scan, driver shall provide the scan status by
calling cfg80211_scan_done().

Reviewed-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Vidyullatha Kanchanapally <vkanchan@qti.qualcomm.com>
Signed-off-by: Sunil Dutt <usdutt@qti.qualcomm.com>
[change command to take wdev instead of netdev so that it
 can be used on p2p-device scans]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Janusz.Dziedzic@tieto.com b115b97299 mac80211: add new IEEE80211_VIF_GET_NOA_UPDATE flag
Add new VIF flag, that will allow get NOA update
notification when driver will request this, even
this is not pure P2P vif (eg. STA vif).

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Michal Sojka c781944b71 cfg80211: Remove unused cfg80211_can_use_iftype_chan()
Last caller of this function was removed in 3.17 in commit
97dc94f1d9.

Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Michal Sojka 491728746b cfg80211: reg: Remove unused function parameter
Signed-off-by: Michal Sojka <sojkam1@fel.cvut.cz>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Eliad Peller ef044763a3 mac80211: add atomic uploaded keys iterator
add ieee80211_iter_keys_rcu() to iterate over uploaded
keys in atomic context (when rcu is locked)

The station removal code removes the keys only after
calling synchronize_net(), so it's not safe to iterate
the keys at this point (and postponing the actual key
deletion with call_rcu() might result in some
badly-ordered ops calls).

Add a flag to indicate a station is being removed,
and skip the configured keys if it's set.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Emmanuel Grumbach 0ead2510f8 mac80211: allow the driver to send EOSP when needed
This can happen when the driver needs to send less frames
than expected and then needs to close the SP.
Mac80211 still needs to set the more_data properly based
on its buffer state (ps_tx_buffer and buffered frames on
other TIDs).
To that end, refactor the code that delivers frames upon
uAPSD trigger frames to be able to get only the more_data
bit without actually delivering those frames in case the
driver is just asking to set a NDP with EOSP and MORE_DATA
bit properly set.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Ola Olsson 1b9df2d20e cfg80211: ocb: Fix null pointer deref if join_ocb is unimplemented
Signed-off-by: Ola Olsson <ola.olsson@sonymobile.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg 441275e103 mac80211: remove string from unaligned packet warning
This really should never happen except very early in the process
of bringing up a new driver, at which point you'll have to add
more debugging in the driver and this string isn't useful. Remove
it and save some size (when it's even compiled in.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg 996bf99c71 lib80211: ratelimit key index mismatch
This indicates a driver key selection issue, but even then there's
no point in printing it all the time, so ratelimit it. Also remove
the priv pointer from it -- people debugging will only have a single
device anyway and it's useless as anything but a cookie.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg d671b2a077 mac80211: mesh: print MAC address instead of pointer
There's no point in printing the mpath pointer since it can't
be used for anything - print the MAC address instead (like in
the forwarding case.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Bob Copeland <me@bobcopeland.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg 0483eeac59 cfg80211: replace ieee80211_ie_split() with an inline
The function is a very simple wrapper around another one,
just adds a few default parameters, so replace it with a
static inline instead of using EXPORT_SYMBOL, reducing
the module size slightly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg 6e045905d1 cfg80211: add complete data to station add/change tracing
Complete the tracepoint with the missing data - it's not printed
by default (a lot of it is dynamic arrays) but will be recorded
and be available during post-processing.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Ilan Peer a1056b1baa cfg80211: Add missing tracing to cfg80211
Add missing tracing for:

1. start_radar_detection()
2. set_mcast_rates()
3. set_coalesce()

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg 3110489117 mac80211: allow driver to prevent two stations w/ same address
Some devices or drivers cannot deal with having the same station
address for different virtual interfaces, say as a client to two
virtual AP interfaces. Rather than requiring each driver with a
limitation like that to enforce it, add a hardware flag for it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:32 +01:00
Johannes Berg d9d3ac7afd Merge remote-tracking branch 'mac80211/master' into HEAD
I want to get the full off-channel bugfix since later code depends on
it, as well as the AP client state change so I can revert it correctly.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-04 14:43:05 +01:00
David S. Miller f188b951f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/renesas/ravb_main.c
	kernel/bpf/syscall.c
	net/ipv4/ipmr.c

All three conflicts were cases of overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 21:09:12 -05:00
Linus Torvalds 071f5d105a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A lot of Thanksgiving turkey leftovers accumulated, here goes:

   1) Fix bluetooth l2cap_chan object leak, from Johan Hedberg.

   2) IDs for some new iwlwifi chips, from Oren Givon.

   3) Fix rtlwifi lockups on boot, from Larry Finger.

   4) Fix memory leak in fm10k, from Stephen Hemminger.

   5) We have a route leak in the ipv6 tunnel infrastructure, fix from
      Paolo Abeni.

   6) Fix buffer pointer handling in arm64 bpf JIT,f rom Zi Shen Lim.

   7) Wrong lockdep annotations in tcp md5 support, fix from Eric
      Dumazet.

   8) Work around some middle boxes which prevent proper handling of TCP
      Fast Open, from Yuchung Cheng.

   9) TCP repair can do huge kmalloc() requests, build paged SKBs
      instead.  From Eric Dumazet.

  10) Fix msg_controllen overflow in scm_detach_fds, from Daniel
      Borkmann.

  11) Fix device leaks on ipmr table destruction in ipv4 and ipv6, from
      Nikolay Aleksandrov.

  12) Fix use after free in epoll with AF_UNIX sockets, from Rainer
      Weikusat.

  13) Fix double free in VRF code, from Nikolay Aleksandrov.

  14) Fix skb leaks on socket receive queue in tipc, from Ying Xue.

  15) Fix ifup/ifdown crach in xgene driver, from Iyappan Subramanian.

  16) Fix clearing of persistent array maps in bpf, from Daniel
      Borkmann.

  17) In TCP, for the cross-SYN case, we don't initialize tp->copied_seq
      early enough.  From Eric Dumazet.

  18) Fix out of bounds accesses in bpf array implementation when
      updating elements, from Daniel Borkmann.

  19) Fill gaps in RCU protection of np->opt in ipv6 stack, from Eric
      Dumazet.

  20) When dumping proxy neigh entries, we have to accomodate NULL
      device pointers properly, from Konstantin Khlebnikov.

  21) SCTP doesn't release all ipv6 socket resources properly, fix from
      Eric Dumazet.

  22) Prevent underflows of sch->q.qlen for multiqueue packet
      schedulers, also from Eric Dumazet.

  23) Fix MAC and unicast list handling in bnxt_en driver, from Jeffrey
      Huang and Michael Chan.

  24) Don't actively scan radar channels, from Antonio Quartulli"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (110 commits)
  net: phy: reset only targeted phy
  bnxt_en: Setup uc_list mac filters after resetting the chip.
  bnxt_en: enforce proper storing of MAC address
  bnxt_en: Fixed incorrect implementation of ndo_set_mac_address
  net: lpc_eth: remove irq > NR_IRQS check from probe()
  net_sched: fix qdisc_tree_decrease_qlen() races
  openvswitch: fix hangup on vxlan/gre/geneve device deletion
  ipv4: igmp: Allow removing groups from a removed interface
  ipv6: sctp: implement sctp_v6_destroy_sock()
  arm64: bpf: add 'store immediate' instruction
  ipv6: kill sk_dst_lock
  ipv6: sctp: add rcu protection around np->opt
  net/neighbour: fix crash at dumping device-agnostic proxy entries
  sctp: use GFP_USER for user-controlled kmalloc
  sctp: convert sack_needed and sack_generation to bits
  ipv6: add complete rcu protection around np->opt
  bpf: fix allocation warnings in bpf maps and integer overflow
  mvebu: dts: enable IP checksum with jumbo frames for Armada 38x on Port0
  net: mvneta: enable setting custom TX IP checksum limit
  net: mvneta: fix error path for building skb
  ...
2015-12-03 16:02:46 -08:00
David S. Miller e3c9b1ef78 A small set of fixes for 4.4:
* fix scanning in mac80211 to not actively scan radar
    channels (from Antonio)
  * fix uninitialized variable in remain-on-channel that
    could lead to treating frame TX as remain-on-channel
    and not sending the frame at all
  * remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
    was broken and needs more work, we'll enable it later
  * fix call_rcu() induced use-after-reset/free in mesh
    (that was suddenly causing issues in certain tests)
  * always request block-ack window size 64 as we found
    some APs will otherwise crash (really ...)
  * fix P2P-Device teardown sequence to avoid restarting
    with uninitialized data
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJWX2JpAAoJEGt7eEactAAd+9cQAJmn3zt0orj/sASv7BeF0h5d
 sRfAhkBVOTZur8MgVj1c7fNzT3h1HYNei6c4SA2+rphy6Vbifoli1nLNloC+1Ld4
 2WXllEqVe473GqVofCxZHsYZPr2Inmhj7uMiDqvoKUiSRz7phmkY9m+Vju6WZG/W
 F6FrTLqFS7UDHIYNYH1DNVSScd/89Gu6pHZvEpoHkrsvt5rEEZiPAQ7sDB4MAMSm
 amETtuBqgX83gHR2G4UT2Z9r8TVdzhO+s7vvdVjj0qbP6C6BaS9IUXDjmm3gOvHy
 G7j9MJuUC8w/2fZ5A5/l94OuN5rF/ZFMNkn2e6OIzg0HjEZh74CeLl21CnuxdNpB
 ECmDVbKoI3OVoFbhEl7P5fBokzZsqhAXpZOmbYEeFRyO6lF2Mv9uzttsF6EOmCX0
 BjIoEXOWA2o6IUD/8M6NjW+/B58SDDVi9Mg6D+7Dn7rUFlQ4pddjb0m94bI8GQQU
 wl7gROMvYR3tIhiMs1bLF9jJgA831WGWu9eiq8mT2kHPaEV2bFO7OK+SUxyZu1M7
 UhN4eoLpU84v9QNJ34N8RCiYxEZ1e6HQxBwQn/fDIWOjOHryZoArhicFY9aOEja4
 9xBI9OJhBWOL4N4AFdmTExBdYudSgCTpX+/gQ4tSfedz3lqF79y8+PILwv6E1Q6D
 8pScH/4pVo4v5omGaMpA
 =vui7
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2015-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A small set of fixes for 4.4:
 * fix scanning in mac80211 to not actively scan radar
   channels (from Antonio)
 * fix uninitialized variable in remain-on-channel that
   could lead to treating frame TX as remain-on-channel
   and not sending the frame at all
 * remove NL80211_FEATURE_FULL_AP_CLIENT_STATE again, it
   was broken and needs more work, we'll enable it later
 * fix call_rcu() induced use-after-reset/free in mesh
   (that was suddenly causing issues in certain tests)
 * always request block-ack window size 64 as we found
   some APs will otherwise crash (really ...)
 * fix P2P-Device teardown sequence to avoid restarting
   with uninitialized data
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:56:22 -05:00
Jon Paul Maloy dc8d1eb305 tipc: fix node reference count bug
Commit 5405ff6e15 ("tipc: convert node lock to rwlock")
introduced a bug to the node reference counter handling. When a
message is successfully sent in the function tipc_node_xmit(),
we return directly after releasing the node lock, instead of
continuing and decrementing the node reference counter as we
should do.

This commit fixes this bug.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:19:40 -05:00
Guillaume Nault 681b4d88ad pppox: use standard module auto-loading feature
* Register PF_PPPOX with pppox module rather than with pppoe,
    so that pppoe doesn't get loaded for any PF_PPPOX socket.

  * Register PX_PROTO_* with standard MODULE_ALIAS_NET_PF_PROTO()
    instead of using pppox's own naming scheme.

  * While there, add auto-loading feature for pptp.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:12:54 -05:00
Asias He 8a2a202989 VSOCK: Add Makefile and Kconfig
Enable virtio-vsock and vhost-vsock.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:05:55 -05:00
Asias He 32e61b06b6 VSOCK: Introduce virtio-vsock.ko
VM sockets virtio transport implementation. This module runs in guest
kernel.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:05:55 -05:00
Asias He 80a19e338d VSOCK: Introduce virtio-vsock-common.ko
This module contains the common code and header files for the following
virtio-vsock and virtio-vhost kernel modules.

Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:05:55 -05:00
Asias He 357ab2234d VSOCK: Introduce vsock_find_unbound_socket and vsock_bind_dgram_generic
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:05:54 -05:00
Roopa Prabhu c89359a42e mpls: support for dead routes
Adds support for RTNH_F_DEAD and RTNH_F_LINKDOWN flags on mpls
routes due to link events. Also adds code to ignore dead
routes during route selection.

Unlike ip routes, mpls routes are not deleted when the route goes
dead. This is current mpls behaviour and this patch does not change
that. With this patch however, routes will be marked dead.
dead routes are not notified to userspace (this is consistent with ipv4
routes).

dead routes:
-----------
$ip -f mpls route show
100
    nexthop as to 200 via inet 10.1.1.2  dev swp1
    nexthop as to 700 via inet 10.1.1.6  dev swp2

$ip link set dev swp1 down

$ip link show dev swp1
4: swp1: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN mode
DEFAULT group default qlen 1000
    link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

$ip -f mpls route show
100
    nexthop as to 200 via inet 10.1.1.2  dev swp1 dead linkdown
    nexthop as to 700 via inet 10.1.1.6  dev swp2

linkdown routes:
----------------
$ip -f mpls route show
100
    nexthop as to 200 via inet 10.1.1.2  dev swp1
    nexthop as to 700 via inet 10.1.1.6  dev swp2

$ip link show dev swp1
4: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast
state UP mode DEFAULT group default qlen 1000
    link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

/* carrier goes down */
$ip link show dev swp1
4: swp1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:02:00:00:00:01 brd ff:ff:ff:ff:ff:ff

$ip -f mpls route show
100
    nexthop as to 200 via inet 10.1.1.2  dev swp1 linkdown
    nexthop as to 700 via inet 10.1.1.6  dev swp2

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 15:03:27 -05:00
Eric Dumazet 4eaf3b84f2 net_sched: fix qdisc_tree_decrease_qlen() races
qdisc_tree_decrease_qlen() suffers from two problems on multiqueue
devices.

One problem is that it updates sch->q.qlen and sch->qstats.drops
on the mq/mqprio root qdisc, while it should not : Daniele
reported underflows errors :
[  681.774821] PAX: sch->q.qlen: 0 n: 1
[  681.774825] PAX: size overflow detected in function qdisc_tree_decrease_qlen net/sched/sch_api.c:769 cicus.693_49 min, count: 72, decl: qlen; num: 0; context: sk_buff_head;
[  681.774954] CPU: 2 PID: 19 Comm: ksoftirqd/2 Tainted: G           O    4.2.6.201511282239-1-grsec #1
[  681.774955] Hardware name: ASUSTeK COMPUTER INC. X302LJ/X302LJ, BIOS X302LJ.202 03/05/2015
[  681.774956]  ffffffffa9a04863 0000000000000000 0000000000000000 ffffffffa990ff7c
[  681.774959]  ffffc90000d3bc38 ffffffffa95d2810 0000000000000007 ffffffffa991002b
[  681.774960]  ffffc90000d3bc68 ffffffffa91a44f4 0000000000000001 0000000000000001
[  681.774962] Call Trace:
[  681.774967]  [<ffffffffa95d2810>] dump_stack+0x4c/0x7f
[  681.774970]  [<ffffffffa91a44f4>] report_size_overflow+0x34/0x50
[  681.774972]  [<ffffffffa94d17e2>] qdisc_tree_decrease_qlen+0x152/0x160
[  681.774976]  [<ffffffffc02694b1>] fq_codel_dequeue+0x7b1/0x820 [sch_fq_codel]
[  681.774978]  [<ffffffffc02680a0>] ? qdisc_peek_dequeued+0xa0/0xa0 [sch_fq_codel]
[  681.774980]  [<ffffffffa94cd92d>] __qdisc_run+0x4d/0x1d0
[  681.774983]  [<ffffffffa949b2b2>] net_tx_action+0xc2/0x160
[  681.774985]  [<ffffffffa90664c1>] __do_softirq+0xf1/0x200
[  681.774987]  [<ffffffffa90665ee>] run_ksoftirqd+0x1e/0x30
[  681.774989]  [<ffffffffa90896b0>] smpboot_thread_fn+0x150/0x260
[  681.774991]  [<ffffffffa9089560>] ? sort_range+0x40/0x40
[  681.774992]  [<ffffffffa9085fe4>] kthread+0xe4/0x100
[  681.774994]  [<ffffffffa9085f00>] ? kthread_worker_fn+0x170/0x170
[  681.774995]  [<ffffffffa95d8d1e>] ret_from_fork+0x3e/0x70

mq/mqprio have their own ways to report qlen/drops by folding stats on
all their queues, with appropriate locking.

A second problem is that qdisc_tree_decrease_qlen() calls qdisc_lookup()
without proper locking : concurrent qdisc updates could corrupt the list
that qdisc_match_from_root() parses to find a qdisc given its handle.

Fix first problem adding a TCQ_F_NOPARENT qdisc flag that
qdisc_tree_decrease_qlen() can use to abort its tree traversal,
as soon as it meets a mq/mqprio qdisc children.

Second problem can be fixed by RCU protection.
Qdisc are already freed after RCU grace period, so qdisc_list_add() and
qdisc_list_del() simply have to use appropriate rcu list variants.

A future patch will add a per struct netdev_queue list anchor, so that
qdisc_tree_decrease_qlen() can have more efficient lookups.

Reported-by: Daniele Fucini <dfucini@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Cong Wang <cwang@twopensource.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 14:59:05 -05:00
Phil Sutter d6df198d92 net: ipv6: restrict hop_limit sysctl setting to range [1; 255]
Setting a value bigger than 255 resulted in using only the lower eight
bits of that value as it is assigned to the u8 header field. To avoid
this unexpected result, reject such values.

Setting a value of zero is technically possible, but hosts receiving
such a packet have to treat it like hop_limit was set to one, according
to RFC2460. Therefore I don't see a use-case for that.

Setting a route's hop_limit to zero in iproute2 means to use the sysctl
default, which is not the case here: Setting e.g.
net.conf.eth0.hop_limit=0 will not make the kernel use
net.conf.all.hop_limit for outgoing packets on eth0. To avoid these
kinds of confusion, reject zero.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 14:42:10 -05:00
Paolo Abeni 1317530302 openvswitch: fix hangup on vxlan/gre/geneve device deletion
Each openvswitch tunnel vport (vxlan,gre,geneve) holds a reference
to the underlying tunnel device, but never released it when such
device is deleted.
Deleting the underlying device via the ip tool cause the kernel to
hangup in the netdev_wait_allrefs() loop.
This commit ensure that on device unregistration dp_detach_port_notify()
is called for all vports that hold the device reference, properly
releasing it.

Fixes: 614732eaa1 ("openvswitch: Use regular VXLAN net_device device")
Fixes: b2acd1dc39 ("openvswitch: Use regular GRE net_device instead of vport")
Fixes: 6b001e682e ("openvswitch: Use Geneve device.")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 14:29:25 -05:00
Andrew Lunn 4eba7bb1d7 ipv4: igmp: Allow removing groups from a removed interface
When a multicast group is joined on a socket, a struct ip_mc_socklist
is appended to the sockets mc_list containing information about the
joined group.

If the interface is hot unplugged, this entry becomes stale. Prior to
commit 52ad353a53 ("igmp: fix the problem when mc leave group") it
was possible to remove the stale entry by performing a
IP_DROP_MEMBERSHIP, passing either the old ifindex or ip address on
the interface. However, this fix enforces that the interface must
still exist. Thus with time, the number of stale entries grows, until
sysctl_igmp_max_memberships is reached and then it is not possible to
join and more groups.

The previous patch fixes an issue where a IP_DROP_MEMBERSHIP is
performed without specifying the interface, either by ifindex or ip
address. However here we do supply one of these. So loosen the
restriction on device existence to only apply when the interface has
not been specified. This then restores the ability to clean up the
stale entries.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Fixes: 52ad353a53 "(igmp: fix the problem when mc leave group")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 12:07:05 -05:00
Eric Dumazet 602dd62dfb ipv6: sctp: implement sctp_v6_destroy_sock()
Dmitry Vyukov reported a memory leak using IPV6 SCTP sockets.

We need to call inet6_destroy_sock() to properly release
inet6 specific fields.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 12:05:57 -05:00
David S. Miller 79aecc7216 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
pull request: bluetooth 2015-12-01

Here's a Bluetooth fix for the 4.4-rc series that fixes a memory leak of
the Security Manager L2CAP channel that'll happen for every LE
connection.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 12:04:05 -05:00
Jiri Pirko 04d482660a net: introduce change lower state notifier
When lower device like bonding slave, team/bridge port, etc changes its
state, it is useful for others to notice this change. Currently this is
implemented specificly for bonding as NETDEV_BONDING_INFO notifier. This
patch aims to replace this specific usage and make this more generic to
be used for all upper-lower devices.

Introduce NETDEV_CHANGELOWERSTATE netdev notifier type and
netdev_lower_state_changed() helper.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 11:49:26 -05:00
Jiri Pirko 29bf24afb2 net: add possibility to pass information about upper device via notifier
Sometimes the drivers and other code would find it handy to know some
internal information about upper device being changed. So allow upper-code
to pass information down to notifier listeners during linking.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 11:49:25 -05:00
Jiri Pirko 6dffb0447c net: propagate upper priv via netdev_master_upper_dev_link
Eliminate netdev_master_upper_dev_link_private and pass priv directly as
a parameter of netdev_master_upper_dev_link.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 11:49:25 -05:00
Ido Schimmel b03804e7c3 net: Check CHANGEUPPER notifier return value
switchdev drivers reflect the newly requested topology to hardware when
CHANGEUPPER is received, after software links were already formed.
However, the operation can fail and user will not be notified, as the
return value of the notifier is not checked.

Add this check and rollback software links if necessary.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 11:49:23 -05:00
Eric Dumazet 6bd4f355df ipv6: kill sk_dst_lock
While testing the np->opt RCU conversion, I found that UDP/IPv6 was
using a mixture of xchg() and sk_dst_lock to protect concurrent changes
to sk->sk_dst_cache, leading to possible corruptions and crashes.

ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
way to fix the mess is to remove sk_dst_lock completely, as we did for
IPv4.

__ip6_dst_store() and ip6_dst_store() share same implementation.

sk_setup_caps() being called with socket lock being held or not,
we have to use sk_dst_set() instead of __sk_dst_set()

Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
in ip6_dst_store() before the sk_setup_caps(sk, dst) call.

This is because ip6_dst_store() can be called from process context,
without any lock held.

As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
from another cpu doing a concurrent ip6_dst_store()

Doing the dst dereference before doing the install is needed to make
sure no use after free would trigger.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 11:32:06 -05:00
Eric Dumazet c836a8ba93 ipv6: sctp: add rcu protection around np->opt
This patch completes the work I did in commit 45f6fad84c
("ipv6: add complete rcu protection around np->opt"), as I missed
sctp part.

This simply makes sure np->opt is used with proper RCU locking
and accessors.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 11:30:58 -05:00
Tejun Heo 1f7dd3e5a6 cgroup: fix handling of multi-destination migration from subtree_control enabling
Consider the following v2 hierarchy.

  P0 (+memory) --- P1 (-memory) --- A
                                 \- B
       
P0 has memory enabled in its subtree_control while P1 doesn't.  If
both A and B contain processes, they would belong to the memory css of
P1.  Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter.  IOW, enabling controllers
can cause atomic migrations into different csses.

The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses.  pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.

 WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
 Modules linked in:
 CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
 ...
  ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
  ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
  ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
 Call Trace:
  [<ffffffff81551ffc>] dump_stack+0x4e/0x82
  [<ffffffff810de202>] warn_slowpath_common+0x82/0xc0
  [<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40
  [<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0
  [<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330
  [<ffffffff81188e05>] cgroup_migrate+0xf5/0x190
  [<ffffffff81189016>] cgroup_attach_task+0x176/0x200
  [<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460
  [<ffffffff81189684>] cgroup_procs_write+0x14/0x20
  [<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0
  [<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190
  [<ffffffff81265f88>] __vfs_write+0x28/0xe0
  [<ffffffff812666fc>] vfs_write+0xac/0x1a0
  [<ffffffff81267019>] SyS_write+0x49/0xb0
  [<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76

This patch fixes the bug by removing @css parameter from the three
migration methods, ->can_attach, ->cancel_attach() and ->attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated.  All controllers are
updated accordingly.

* Controllers which don't care whether there are one or multiple
  target csses can be converted trivially.  cpu, io, freezer, perf,
  netclassid and netprio fall in this category.

* cpuset's current implementation assumes that there's single source
  and destination and thus doesn't support v2 hierarchy already.  The
  only change made by this patchset is how that single destination css
  is obtained.

* memory migration path already doesn't do anything on v2.  How the
  single destination css is obtained is updated and the prep stage of
  mem_cgroup_can_attach() is reordered to accomodate the change.

* pids is the only controller which was affected by this bug.  It now
  correctly handles multi-destination migrations and no longer causes
  counter underflow from incorrect accounting.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
2015-12-03 10:18:21 -05:00
Konstantin Khlebnikov 6adc5fd6a1 net/neighbour: fix crash at dumping device-agnostic proxy entries
Proxy entries could have null pointer to net-device.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Fixes: 84920c1420 ("net: Allow ipv6 proxies and arp proxies be shown with iproute2")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-03 00:07:51 -05:00
Eric Dumazet 7450aaf61f tcp: suppress too verbose messages in tcp_send_ack()
If tcp_send_ack() can not allocate skb, we properly handle this
and setup a timer to try later.

Use __GFP_NOWARN to avoid polluting syslog in the case host is
under memory pressure, so that pertinent messages are not lost under
a flood of useless information.

sk_gfp_atomic() can use its gfp_mask argument (all callers currently
were using GFP_ATOMIC before this patch)

We rename sk_gfp_atomic() to sk_gfp_mask() to clearly express this
function now takes into account its second argument (gfp_mask)

Note that when tcp_transmit_skb() is called with clone_it set to false,
we do not attempt memory allocations, so can pass a 0 gfp_mask, which
most compilers can emit faster than a non zero or constant value.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02 23:44:32 -05:00
Marcelo Ricardo Leitner cacc062152 sctp: use GFP_USER for user-controlled kmalloc
Dmitry Vyukov reported that the user could trigger a kernel warning by
using a large len value for getsockopt SCTP_GET_LOCAL_ADDRS, as that
value directly affects the value used as a kmalloc() parameter.

This patch thus switches the allocation flags from all user-controllable
kmalloc size to GFP_USER to put some more restrictions on it and also
disables the warn, as they are not necessary.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02 23:39:46 -05:00
Eric Dumazet 45f6fad84c ipv6: add complete rcu protection around np->opt
This patch addresses multiple problems :

UDP/RAW sendmsg() need to get a stable struct ipv6_txoptions
while socket is not locked : Other threads can change np->opt
concurrently. Dmitry posted a syzkaller
(http://github.com/google/syzkaller) program desmonstrating
use-after-free.

Starting with TCP/DCCP lockless listeners, tcp_v6_syn_recv_sock()
and dccp_v6_request_recv_sock() also need to use RCU protection
to dereference np->opt once (before calling ipv6_dup_options())

This patch adds full RCU protection to np->opt

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02 23:37:16 -05:00
Johannes Berg c1df932c05 mac80211: fix off-channel mgmt-tx uninitialized variable usage
In the last change here, I neglected to update the cookie in one code
path: when a mgmt-tx has no real cookie sent to userspace as it doesn't
wait for a response, but is off-channel. The original code used the SKB
pointer as the cookie and always assigned the cookie to the TX SKB in
ieee80211_start_roc_work(), but my change turned this around and made
the code rely on a valid cookie being passed in.

Unfortunately, the off-channel no-wait TX path wasn't assigning one at
all, resulting in an uninitialized stack value being used. This wasn't
handed back to userspace as a cookie (since in the no-wait case there
isn't a cookie), but it was tested for non-zero to distinguish between
mgmt-tx and off-channel.

Fix this by assigning a dummy non-zero cookie unconditionally, and get
rid of a misleading comment and some dead code while at it. I'll clean
up the ACK SKB handling separately later.

Fixes: 3b79af973c ("mac80211: stop using pointers as userspace cookies")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-02 22:27:53 +01:00
Antonio Quartulli 4e39ccac0d mac80211: do not actively scan DFS channels
DFS channels should not be actively scanned as we can't be sure
if we are allowed or not.

If the current channel is in the DFS band, active scan might be
performed after CSA, but we have no guarantee about other channels,
therefore it is safer to prevent active scanning at all.

Cc: stable@vger.kernel.org
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-02 22:27:53 +01:00
Eliad Peller 835112b289 mac80211: don't teardown sdata on sdata stop
Interfaces are being initialized (setup) on addition,
and torn down on removal.

However, p2p device is being torn down when stopped,
resulting in the next p2p start operation being done
on uninitialized interface.

Solve it by calling ieee80211_teardown_sdata() only
on interface removal (for the non-netdev case).

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
[squashed in fix to call teardown after unregister]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-12-02 22:27:27 +01:00
Paolo Abeni 83e4bf7a74 openvswitch: properly refcount vport-vxlan module
After 614732eaa1, no refcount is maintained for the vport-vxlan module.
This allows the userspace to remove such module while vport-vxlan
devices still exist, which leads to later oops.

v1 -> v2:
 - move vport 'owner' initialization in ovs_vport_ops_register()
   and make such function a macro

Fixes: 614732eaa1 ("openvswitch: Use regular VXLAN net_device device")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-02 11:50:59 -05:00
Eric Dumazet ceb5d58b21 net: fix sock_wake_async() rcu protection
Dmitry provided a syzkaller (http://github.com/google/syzkaller)
triggering a fault in sock_wake_async() when async IO is requested.

Said program stressed af_unix sockets, but the issue is generic
and should be addressed in core networking stack.

The problem is that by the time sock_wake_async() is called,
we should not access the @flags field of 'struct socket',
as the inode containing this socket might be freed without
further notice, and without RCU grace period.

We already maintain an RCU protected structure, "struct socket_wq"
so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it
is the safe route.

It also reduces number of cache lines needing dirtying, so might
provide a performance improvement anyway.

In followup patches, we might move remaining flags (SOCK_NOSPACE,
SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket'
being mostly read and let it being shared between cpus.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01 15:45:05 -05:00
Eric Dumazet 9cd3e072b0 net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
This patch is a cleanup to make following patch easier to
review.

Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA
from (struct socket)->flags to a (struct socket_wq)->flags
to benefit from RCU protection in sock_wake_async()

To ease backports, we rename both constants.

Two new helpers, sk_set_bit(int nr, struct sock *sk)
and sk_clear_bit(int net, struct sock *sk) are added so that
following patch can change their implementation.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01 15:45:05 -05:00
Nicolas Dichtel 304d888b29 Revert "ipv6: ndisc: inherit metadata dst when creating ndisc requests"
This reverts commit ab450605b3.

In IPv6, we cannot inherit the dst of the original dst. ndisc packets
are IPv6 packets and may take another route than the original packet.

This patch breaks the following scenario: a packet comes from eth0 and
is forwarded through vxlan1. The encapsulated packet triggers an NS
which cannot be sent because of the wrong route.

CC: Jiri Benc <jbenc@redhat.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01 15:07:59 -05:00
Rainer Weikusat 77b75f4d8c unix: use wq_has_sleeper in unix_dgram_recvmsg
The current unix_dgram_recvmsg does a wake up for every received
datagram. This seems wasteful as only SOCK_DGRAM client sockets in an
n:1 association with a server socket will ever wait because of the
associated condition. The patch below changes the function such that the
wake up only happens if wq_has_sleeper indicates that someone actually
wants to be notified. Testing with SOCK_SEQPACKET and SOCK_DGRAM socket
seems to confirm that this is an improvment.

Signed-Off-By: Rainer Weikusat <rweikusat@mobileactivedefense.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-12-01 14:57:43 -05:00
Eric Dumazet 142a2e7ece tcp: initialize tp->copied_seq in case of cross SYN connection
Dmitry provided a syzkaller (http://github.com/google/syzkaller)
generated program that triggers the WARNING at
net/ipv4/tcp.c:1729 in tcp_recvmsg() :

WARN_ON(tp->copied_seq != tp->rcv_nxt &&
        !(flags & (MSG_PEEK | MSG_TRUNC)));

His program is specifically attempting a Cross SYN TCP exchange,
that we support (for the pleasure of hackers ?), but it looks we
lack proper tcp->copied_seq initialization.

Thanks again Dmitry for your report and testings.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30 15:34:17 -05:00
Nikolay Aleksandrov ccbb0aa62d net: ipmr: add mfc newroute/delroute netlink support
This patch adds support to add and remove MFC entries. It uses the
same attributes like the already present dump support in order to be
consistent. There's one new entry - RTA_PREFSRC, it's used to denote an
MFC_PROXY entry (see MRT_ADD_MFC vs MRT_ADD_MFC_PROXY).
The already existing infrastructure is used to create and delete the
entries, the netlink message gets converted internally to a struct mfcctl
which is used with ipmr_mfc_add/delete.
The other used attributes are:
RTA_IIF - used for mfcc_parent (when adding it's required to be valid)
RTA_SRC - used for mfcc_origin
RTA_DST - used for mfcc_mcastgrp
RTA_TABLE - the MRT table id
RTA_MULTIPATH - the "oifs" ttl array

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30 15:26:23 -05:00
Nikolay Aleksandrov 42e6b89ce4 net: ipmr: fix setsockopt error return
We can have both errors and we'll return the second one, fix it to
return an error at a time as it's normal. I've overlooked this in my
previous set.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30 15:26:22 -05:00
Nikolay Aleksandrov 1973a4ea6c net: ipmr: move pimsm_enabled to pim.h and rename
Move the inline pimsm_enabled() to pim.h and rename it to
ipmr_pimsm_enabled to show it's for the ipv4 ipmr code since pim.h is
used by IPv6 too.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30 15:26:22 -05:00
Nikolay Aleksandrov 5ea1f13299 net: ipmr: move struct mr_table and VIF_EXISTS to mroute.h
Move the definitions of VIF_EXISTS() and struct mr_table to mroute.h

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30 15:26:22 -05:00
Nikolay Aleksandrov 06bd6c0370 net: ipmr: remove unused MFC_NOTIFY flag and make the flags enum
MFC_NOTIFY was introduced in kernel 2.1.68 but afaik it hasn't been used
and I couldn't find any users currently so just remove it. Only
MFC_STATIC is left, so move it into an enum, add a description and use
BIT().

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30 15:26:22 -05:00
Nikolay Aleksandrov dfc3b0e891 net: remove unnecessary mroute.h includes
It looks like many files are including mroute.h unnecessarily, so remove
the include. Most importantly remove it from ipv6.

CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30 15:26:21 -05:00
Hannes Frederic Sowa 9490f886b1 af-unix: passcred support for sendpage
sendpage did not care about credentials at all. This could lead to
situations in which because of fd passing between processes we could
append data to skbs with different scm data. It is illegal to splice those
skbs together. Instead we have to allocate a new skb and if requested
fill out the scm details.

Fixes: 869e7c6248 ("net: af_unix: implement stream sendpage support")
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30 15:16:06 -05:00
Herbert Xu 1ce0bf50ae net: Generalise wq_has_sleeper helper
The memory barrier in the helper wq_has_sleeper is needed by just
about every user of waitqueue_active.  This patch generalises it
by making it take a wait_queue_head_t directly.  The existing
helper is renamed to skwq_has_sleeper.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-30 14:47:33 -05:00
Martin Blumenstingl 880621c260 packet: Allow packets with only a header (but no payload)
Commit 9c7077622d ("packet: make packet_snd fail on len smaller
than l2 header") added validation for the packet size in packet_snd.
This change enforces that every packet needs a header (with at least
hard_header_len bytes) plus a payload with at least one byte. Before
this change the payload was optional.

This fixes PPPoE connections which do not have a "Service" or
"Host-Uniq" configured (which is violating the spec, but is still
widely used in real-world setups). Those are currently failing with the
following message: "pppd: packet size is too short (24 <= 24)"

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-29 22:17:17 -05:00
Patrick McHardy 7ec3f7b47b netfilter: nft_payload: add packet mangling support
Add support for mangling packet payload. Checksum for the specified base
header is updated automatically if requested, however no updates for any
kind of pseudo headers are supported, meaning no stateless NAT is supported.

For checksum updates different checksumming methods can be specified. The
currently supported methods are NONE for no checksum updates, and INET for
internet type checksums.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-25 13:54:51 +01:00
Philip Whineray f13f2aeed1 netfilter: Set /proc/net entries owner to root in namespace
Various files are owned by root with 0440 permission. Reading them is
impossible in an unprivileged user namespace, interfering with firewall
tools. For instance, iptables-save relies on /proc/net/ip_tables_names
contents to dump only loaded tables.

This patch assigned ownership of the following files to root in the
current namespace:

- /proc/net/*_tables_names
- /proc/net/*_tables_matches
- /proc/net/*_tables_targets
- /proc/net/nf_conntrack
- /proc/net/nf_conntrack_expect
- /proc/net/netfilter/nfnetlink_log

A mapping for root must be available, so this order should be followed:

unshare(CLONE_NEWUSER);
/* Setup the mapping */
unshare(CLONE_NEWNET);

Signed-off-by: Philip Whineray <phil@firehol.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-25 13:54:09 +01:00
Quentin Casasnovas 8c7188b234 RDS: fix race condition when sending a message on unbound socket
Sasha's found a NULL pointer dereference in the RDS connection code when
sending a message to an apparently unbound socket.  The problem is caused
by the code checking if the socket is bound in rds_sendmsg(), which checks
the rs_bound_addr field without taking a lock on the socket.  This opens a
race where rs_bound_addr is temporarily set but where the transport is not
in rds_bind(), leading to a NULL pointer dereference when trying to
dereference 'trans' in __rds_conn_create().

Vegard wrote a reproducer for this issue, so kindly ask him to share if
you're interested.

I cannot reproduce the NULL pointer dereference using Vegard's reproducer
with this patch, whereas I could without.

Complete earlier incomplete fix to CVE-2015-6937:

  74e98eb085 ("RDS: verify the underlying transport exists before creating a connection")

Cc: David S. Miller <davem@davemloft.net>
Cc: stable@vger.kernel.org

Reviewed-by: Vegard Nossum <vegard.nossum@oracle.com>
Reviewed-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24 17:20:09 -05:00
Aaron Conole 20f795666d net: openvswitch: Remove invalid comment
During pre-upstream development, the openvswitch datapath used a custom
hashtable to store vports that could fail on delete due to lack of
memory. However, prior to upstream submission, this code was reworked to
use an hlist based hastable with flexible-array based buckets. As such
the failure condition was eliminated from the vport_del path, rendering
this comment invalid.

Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24 17:18:00 -05:00
Nikolay Aleksandrov fbdd29bfd2 net: ipmr, ip6mr: fix vif/tunnel failure race condition
Since (at least) commit b17a7c179d ("[NET]: Do sysfs registration as
part of register_netdevice."), netdev_run_todo() deals only with
unregistration, so we don't need to do the rtnl_unlock/lock cycle to
finish registration when failing pimreg or dvmrp device creation. In
fact that opens a race condition where someone can delete the device
while rtnl is unlocked because it's fully registered. The problem gets
worse when netlink support is introduced as there are more points of entry
that can cause it and it also makes reusing that code correctly impossible.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24 17:15:56 -05:00
David Howells 33c40e242c rxrpc: Correctly handle ack at end of client call transmit phase
Normally, the transmit phase of a client call is implicitly ack'd by the
reception of the first data packet of the response being received.
However, if a security negotiation happens, the transmit phase, if it is
entirely contained in a single packet, may get an ack packet in response
and then may get aborted due to security negotiation failure.

Because the client has shifted state to RXRPC_CALL_CLIENT_AWAIT_REPLY due
to having transmitted all the data, the code that handles processing of the
received ack packet doesn't note the hard ack the data packet.

The following abort packet in the case of security negotiation failure then
incurs an assertion failure when it tries to drain the Tx queue because the
hard ack state is out of sync (hard ack means the packets have been
processed and can be discarded by the sender; a soft ack means that the
packets are received but could still be discarded and rerequested by the
receiver).

To fix this, we should record the hard ack we received for the ack packet.

The assertion failure looks like:

	RxRPC: Assertion failed
	1 <= 0 is false
	0x1 <= 0x0 is false
	------------[ cut here ]------------
	kernel BUG at ../net/rxrpc/ar-ack.c:431!
	...
	RIP: 0010:[<ffffffffa006857b>]  [<ffffffffa006857b>] rxrpc_rotate_tx_window+0xbc/0x131 [af_rxrpc]
	...

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24 17:14:50 -05:00
Michal Kubeček 264640fc2c ipv6: distinguish frag queues by device for multicast and link-local packets
If a fragmented multicast packet is received on an ethernet device which
has an active macvlan on top of it, each fragment is duplicated and
received both on the underlying device and the macvlan. If some
fragments for macvlan are processed before the whole packet for the
underlying device is reassembled, the "overlapping fragments" test in
ip6_frag_queue() discards the whole fragment queue.

To resolve this, add device ifindex to the search key and require it to
match reassembling multicast packets and packets to link-local
addresses.

Note: similar patch has been already submitted by Yoshifuji Hideaki in

  http://patchwork.ozlabs.org/patch/220979/

but got lost and forgotten for some reason.

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24 16:45:47 -05:00
David S. Miller 54f1aa2e57 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-11-23

Here's the first bluetooth-next pull request for the 4.5 kernel.

 - Add new Get Advertising Size Information management command
 - Add support for new system note message type on monitor channel
 - Refactor LE scan changes behind separate workqueue to avoid races
 - Fix issue with privacy feature when powering on adapter
 - Various minor fixes & cleanups here and there

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24 16:22:40 -05:00
J. Bruce Fields 414ca017a5 nfsd4: fix gss-proxy 4.1 mounts for some AD principals
The principal name on a gss cred is used to setup the NFSv4.0 callback,
which has to have a client principal name to authenticate to.

That code wants the name to be in the form servicetype@hostname.
rpc.svcgssd passes down such names (and passes down no principal name at
all in the case the principal isn't a service principal).

gss-proxy always passes down the principal name, and passes it down in
the form servicetype/hostname@REALM.  So we've been munging the name
gss-proxy passes down into the format the NFSv4.0 callback code expects,
or throwing away the name if we can't.

Since the introduction of the MACH_CRED enforcement in NFSv4.1, we've
also been using the principal name to verify that certain operations are
done as the same principal as was used on the original EXCHANGE_ID call.

For that application, the original name passed down by gss-proxy is also
useful.

Lack of that name in some cases was causing some kerberized NFSv4.1
mount failures in an Active Directory environment.

This fix only works in the gss-proxy case.  The fix for legacy
rpc.svcgssd would be more involved, and rpc.svcgssd already has other
problems in the AD case.

Reported-and-tested-by: James Ralston <ralston@pobox.com>
Acked-by: Simo Sorce <simo@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-24 11:36:31 -07:00
J. Bruce Fields 6496500cf1 svcrpc: move some initialization to common code
Minor cleanup, no change in behavior.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-24 10:39:16 -07:00
Geert Uytterhoeven 6c1c36b02c net/ipv4/ipconfig: Rejoin broken lines in console output
Commit 09605cc12c ("net ipv4: use preferred log methods") replaced
a few calls of pr_cont() after a console print without a trailing
newline by pr_info(), causing lines to be split during IP
autoconfiguration, like:

    .
    ,
     OK
    IP-Config: Got DHCP answer from 192.168.97.254,
    my address is 192.168.97.44

Convert these back to using pr_cont(), so it prints again:

    ., OK
    IP-Config: Got DHCP answer from 192.168.97.254, my address is 192.168.97.44

Absorb the printing of "my address ..." into the previous call to
pr_info(), as there's no reason to use a continuation there.

Convert one more pr_info() to print nameservers while we're at it.

Fixes: 09605cc12c ("net ipv4: use preferred log methods")
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24 12:00:09 -05:00
Arnd Bergmann 85beabfeca net: dsa: include gpio consumer header file
After the introduction of the switch gpio reset API, I'm getting
build errors in configurations that disable CONFIG_GPIOLIB:

net/dsa/dsa.c:783:16: error: implicit declaration of function 'gpio_to_desc' [-Werror=implicit-function-declaration]

The reason is that linux/gpio/consumer.h is not automatically
included without gpiolib support. This adds an explicit #include
statement to make it compile in all configurations. The reset
functionality will not work without gpiolib, which is what you
get when disabling the feature.

As far as I can tell, gpiolib is supported on all architectures
on which you can have DSA at the moment.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: cc30c16344 ("net: dsa: Add support for a switch reset gpio")
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24 11:28:06 -05:00
Ying Xue 7098356bac tipc: fix error handling of expanding buffer headroom
Coverity says:

*** CID 1338065:  Error handling issues  (CHECKED_RETURN)
/net/tipc/udp_media.c: 162 in tipc_udp_send_msg()
156     	struct udp_media_addr *dst = (struct udp_media_addr *)&dest->value;
157     	struct udp_media_addr *src = (struct udp_media_addr *)&b->addr.value;
158     	struct sk_buff *clone;
159     	struct rtable *rt;
160
161     	if (skb_headroom(skb) < UDP_MIN_HEADROOM)
>>>     CID 1338065:  Error handling issues  (CHECKED_RETURN)
>>>     Calling "pskb_expand_head" without checking return value (as is done elsewhere 51 out of 56 times).
162     		pskb_expand_head(skb, UDP_MIN_HEADROOM, 0, GFP_ATOMIC);
163
164     	clone = skb_clone(skb, GFP_ATOMIC);
165     	skb_set_inner_protocol(clone, htons(ETH_P_TIPC));
166     	ub = rcu_dereference_rtnl(b->media_ptr);
167     	if (!ub) {

When expanding buffer headroom over udp tunnel with pskb_expand_head(),
it's unfortunate that we don't check its return value. As a result, if
the function returns an error code due to the lack of memory, it may
cause unpredictable consequence as we unconditionally consider that
it's always successful.

Fixes: e53567948f ("tipc: conditionally expand buffer headroom over udp tunnel")
Reported-by: <scan-admin@coverity.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-24 11:26:19 -05:00
Ying Xue f4195d1eac tipc: avoid packets leaking on socket receive queue
Even if we drain receive queue thoroughly in tipc_release() after tipc
socket is removed from rhashtable, it is possible that some packets
are in flight because some CPU runs receiver and did rhashtable lookup
before we removed socket. They will achieve receive queue, but nobody
delete them at all. To avoid this leak, we register a private socket
destructor to purge receive queue, meaning releasing packets pending
on receive queue will be delayed until the last reference of tipc
socket will be released.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 23:45:15 -05:00
Benjamin Coddington 38b7631fbe nfs4: limit callback decoding to received bytes
A truncated cb_compound request will cause the client to decode null or
data from a previous callback for nfs4.1 backchannel case, or uninitialized
data for the nfs4.0 case. This is because the path through
svc_process_common() advances the request's iov_base and decrements iov_len
without adjusting the overall xdr_buf's len field.  That causes
xdr_init_decode() to set up the xdr_stream with an incorrect length in
nfs4_callback_compound().

Fixing this for the nfs4.1 backchannel case first requires setting the
correct iov_len and page_len based on the length of received data in the
same manner as the nfs4.0 case.

Then the request's xdr_buf length can be adjusted for both cases based upon
the remaining iov_len and page_len.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-23 22:03:15 -05:00
Julia Lawall 3b22dae38d VSOCK: constify vmci_transport_notify_ops structures
The vmci_transport_notify_ops structures are never modified, so declare
them as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 15:49:29 -05:00
Julia Lawall 4dd191bb61 net: atm: constify in_cache_ops and eg_cache_ops structures
The in_cache_ops and eg_cache_ops structures are never modified, so declare
them as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 15:49:29 -05:00
Nikolay Aleksandrov a0b477366a net: ipmr: factor out common vif init code
Factor out common vif init code used in both tunnel and pimreg
initialization and create ipmr_init_vif_indev() function.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 15:06:39 -05:00
Nikolay Aleksandrov 29e97d2145 net: ipmr: rearrange and cleanup setsockopt
Take rtnl in the beginning unconditionally as most options already need
it (one exception - MRT_DONE, see the comment inside), make the
lock/unlock places central and move out the switch() local variables.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 15:06:39 -05:00
Nikolay Aleksandrov af623236a9 net: ipmr: drop ip_mr_init() mrt_cachep null check as we'll panic if it fails
It's not necessary to check for null as SLAB_PANIC is used and we'll
panic if the alloc fails, so just drop it.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 15:06:39 -05:00
Nikolay Aleksandrov 29c3f19739 net: ipmr: drop an instance of CONFIG_IP_MROUTE_MULTIPLE_TABLES
Trivial replace of ifdef with IS_BUILTIN().

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 15:06:39 -05:00
Nikolay Aleksandrov fe9ef3ce39 net: ipmr: make ip_mroute_getsockopt more understandable
Use a switch to determine if optname is correct and set val accordingly.
This produces a much more straight-forward and readable code.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 15:06:38 -05:00
Nikolay Aleksandrov 7ef8f65df9 net: ipmr: fix code and comment style
Trivial code and comment style fixes, also removed some extra newlines,
spaces and tabs.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 15:06:38 -05:00
Nikolay Aleksandrov c316c629f1 net: ipmr: remove some pimsm ifdefs and simplify
Add the helper pimsm_enabled() which replaces the old CONFIG_IP_PIMSM
define and is used to check if any version of PIM-SM has been enabled.
Use a single if defined(CONFIG_IP_PIMSM_V1) || defined(CONFIG_IP_PIMSM_V2)
for the pim-sm shared code. This is okay w.r.t IGMPMSG_WHOLEPKT because
only a VIFF_REGISTER device can send such packet, and it can't be
created if pimsm_enabled() is false.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 15:06:38 -05:00
Nikolay Aleksandrov f3d431810e net: ipmr: always define mroute_reg_vif_num
Before mroute_reg_vif_num was defined only if any of the CONFIG_PIMSM_
options were set, but that's not really necessary as the size of the
struct is the same in both cases (checked with pahole, both cases size
is 3256 bytes) and we can remove some unnecessary ifdefs to simplify the
code.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 15:06:37 -05:00
Nikolay Aleksandrov 1113ebbcf9 net: ipmr: move the tbl id check in ipmr_new_table
Move the table id check in ipmr_new_table and make it return error
pointer. We need this change for the upcoming netlink table manipulation
support in order to avoid code duplication and a race condition.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 15:06:37 -05:00
Dan Carpenter 3d1a54e801 net/hsr: fix a warning message
WARN_ON_ONCE() takes a condition, it doesn't take an error message.  I
have converted this to WARN() instead.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 14:56:15 -05:00
Chuck Lever 6f18dc8939 svcrdma: Do not send XDR roundup bytes for a write chunk
Minor optimization: when dealing with write chunk XDR roundup, do
not post a Write WR for the zero bytes in the pad. Simply update
the write segment in the RPC-over-RDMA header to reflect the extra
pad bytes.

The Reply chunk is also a write chunk, but the server does not use
send_write_chunks() to send the Reply chunk. That's OK in this case:
the server Upper Layer typically marshals the Reply chunk contents
in a single contiguous buffer, without a separate tail for the XDR
pad.

The comments and the variable naming refer to "chunks" but what is
really meant is "segments." The existing code sends only one
xdr_write_chunk per RPC reply.

The fix assumes this as well. When the XDR pad in the first write
chunk is reached, the assumption is the Write list is complete and
send_write_chunks() returns.

That will remain a valid assumption until the server Upper Layer can
support multiple bulk payload results per RPC.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-23 12:15:30 -07:00
Rainer Weikusat 7d267278a9 unix: avoid use-after-free in ep_remove_wait_queue
Rainer Weikusat <rweikusat@mobileactivedefense.com> writes:
An AF_UNIX datagram socket being the client in an n:1 association with
some server socket is only allowed to send messages to the server if the
receive queue of this socket contains at most sk_max_ack_backlog
datagrams. This implies that prospective writers might be forced to go
to sleep despite none of the message presently enqueued on the server
receive queue were sent by them. In order to ensure that these will be
woken up once space becomes again available, the present unix_dgram_poll
routine does a second sock_poll_wait call with the peer_wait wait queue
of the server socket as queue argument (unix_dgram_recvmsg does a wake
up on this queue after a datagram was received). This is inherently
problematic because the server socket is only guaranteed to remain alive
for as long as the client still holds a reference to it. In case the
connection is dissolved via connect or by the dead peer detection logic
in unix_dgram_sendmsg, the server socket may be freed despite "the
polling mechanism" (in particular, epoll) still has a pointer to the
corresponding peer_wait queue. There's no way to forcibly deregister a
wait queue with epoll.

Based on an idea by Jason Baron, the patch below changes the code such
that a wait_queue_t belonging to the client socket is enqueued on the
peer_wait queue of the server whenever the peer receive queue full
condition is detected by either a sendmsg or a poll. A wake up on the
peer queue is then relayed to the ordinary wait queue of the client
socket via wake function. The connection to the peer wait queue is again
dissolved if either a wake up is about to be relayed or the client
socket reconnects or a dead peer is detected or the client socket is
itself closed. This enables removing the second sock_poll_wait from
unix_dgram_poll, thus avoiding the use-after-free, while still ensuring
that no blocked writer sleeps forever.

Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Fixes: ec0d215f94 ("af_unix: fix 'poll for write'/connected DGRAM sockets")
Reviewed-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 12:29:58 -05:00
Nina Schiff 3b13758f51 cgroups: Allow dynamically changing net_classid
The classid of a process is changed either when a process is moved to
or from a cgroup or when the net_cls.classid file is updated.
Previously net_cls only supported propogating these changes to the
cgroup's related sockets when a process was added or removed from the
cgroup. This means it was neccessary to remove and re-add all processes
to a cgroup in order to update its classid. This change introduces
support for doing this dynamically - i.e. when the value is changed in
the net_cls_classid file, this will also trigger an update to the
classid associated with all sockets controlled by the cgroup.
This mimics the behaviour of other cgroup subsystems.
net_prio circumvents this issue by storing an index into a table with
each socket (and so any updates to the table, don't require updating
the value associated with the socket). net_cls, however, passes the
socket the classid directly, and so this additional step is needed.

Signed-off-by: Nina Schiff <ninasc@fb.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 12:13:46 -05:00
Florian Westphal daaa7d647f netfilter: ipv6: avoid nf_iterate recursion
The previous patch changed nf_ct_frag6_gather() to morph reassembled skb
with the previous one.

This means that the return value is always NULL or the skb argument.
So change it to an err value.

Instead of invoking NF_HOOK recursively with threshold to skip already-called hooks
we can now just return NF_ACCEPT to move on to the next hook except for
-EINPROGRESS (which means skb has been queued for reassembly), in which case we
return NF_STOLEN.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-23 17:54:45 +01:00
Florian Westphal 029f7f3b87 netfilter: ipv6: nf_defrag: avoid/free clone operations
commit 6aafeef03b
("netfilter: push reasm skb through instead of original frag skbs")
changed ipv6 defrag to not use the original skbs anymore.

So rather than keeping the original skbs around just to discard them
afterwards just use the original skbs directly for the fraglist of
the newly assembled skb and remove the extra clone/free operations.

The skb that completes the fragment queue is morphed into a the
reassembled one instead, just like ipv4 defrag.

openvswitch doesn't need any additional skb_morph magic anymore to deal
with this situation so just remove that.

A followup patch can then also remove the NF_HOOK (re)invocation in
the ipv6 netfilter defrag hook.

Cc: Joe Stringer <joestringer@nicira.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-23 17:54:44 +01:00
stephen hemminger a18fd970ce netfilter: remove duplicate include
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-23 17:54:43 +01:00
Ian Morris c1bc1d257b netfilter-bridge: layout of if statements
Eliminate some checkpatch issues by improved layout of if statements.

No changes detected by objdiff.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-23 17:54:41 +01:00
Ian Morris abcdd9a623 netfilter-bridge: brace placement
Change brace placement to eliminate checkpatch error.

No changes detected by objdiff.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-23 17:54:40 +01:00
Ian Morris 7f495ad946 netfilter-bridge: use netdev style comments
Changes comments to use netdev style.

No changes detected by objdiff.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-23 17:54:39 +01:00
Ian Morris 052a4bc49d netfilter-bridge: Cleanse indentation
Fixes a bunch of issues detected by checkpatch with regards to code
indentation.

No changes detected by objdiff.

Signed-off-by: Ian Morris <ipm@chirality.org.uk>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-23 17:54:39 +01:00
Andrew Lunn cc30c16344 net: dsa: Add support for a switch reset gpio
Some boards have a gpio line tied to the switch reset pin. Allow this
gpio to be retrieved from the device tree, and take the switch out of
reset before performing the probe.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-23 11:53:10 -05:00
Johan Hedberg dc4270c0cd Bluetooth: Increment management interface revision
This patch increments the management interface revision due to
introduction of a new Get Advertising Size Information command and
various other fixes & improvements.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-23 14:13:32 +01:00
Andrzej Kaczmarek e59a554235 Bluetooth: Fix powering on with privacy and advertising
In order to enable advertising with privacy enabled, SMP has to be
registered in order to generate new RPA. During power on, it will be
registered at the very end which is the reason why advertising is not
enabled and it's not possible to enable it anymore due to mismatch
between hci_dev settings and actual controller state.

This fixes this problem by moving SMP registration earlier, just after
controller is powered (which is ok, because LE SMP will be already able
to decide on identity address to be used), but before advertising is
enabled.

Signed-off-by: Andrzej Kaczmarek <andrzej.kaczmarek@codecoup.pl>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-23 13:57:55 +01:00
Johan Hedberg 568f44f636 Bluetooth: Fix returning proper HCI status from __hci_req_sync
There were a couple of code paths missed by the previous patch that
added a HCI status return parameter to __hci_req_sync. This patch adds
the missing assignments for them.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-23 13:50:11 +01:00
Arnd Bergmann 8e662164ab netfilter: nfnetlink_queue: avoid harmless unnitialized variable warnings
Several ARM default configurations give us warnings on recent
compilers about potentially uninitialized variables in the
nfnetlink code in two functions:

net/netfilter/nfnetlink_queue.c: In function 'nfqnl_build_packet_message':
net/netfilter/nfnetlink_queue.c:519:19: warning: 'nfnl_ct' may be used uninitialized in this function [-Wmaybe-uninitialized]
  if (ct && nfnl_ct->build(skb, ct, ctinfo, NFQA_CT, NFQA_CT_INFO) < 0)

Moving the rcu_dereference(nfnl_ct_hook) call outside of the
conditional code avoids the warning without forcing us to
preinitialize the variable.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: a4b4766c3c ("netfilter: nfnetlink_queue: rename related to nfqueue attaching conntrack info")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-23 11:22:26 +01:00
Nikolay Aleksandrov 4c6980462f net: ip6mr: fix static mfc/dev leaks on table destruction
Similar to ipv4, when destroying an mrt table the static mfc entries and
the static devices are kept, which leads to devices that can never be
destroyed (because of refcnt taken) and leaked memory. Make sure that
everything is cleaned up on netns destruction.

Fixes: 8229efdaef ("netns: ip6mr: enable namespace support in ipv6 multicast forwarding code")
CC: Benjamin Thery <benjamin.thery@bull.net>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-22 20:44:47 -05:00
Nikolay Aleksandrov 0e615e9601 net: ipmr: fix static mfc/dev leaks on table destruction
When destroying an mrt table the static mfc entries and the static
devices are kept, which leads to devices that can never be destroyed
(because of refcnt taken) and leaked memory, for example:
unreferenced object 0xffff880034c144c0 (size 192):
  comm "mfc-broken", pid 4777, jiffies 4320349055 (age 46001.964s)
  hex dump (first 32 bytes):
    98 53 f0 34 00 88 ff ff 98 53 f0 34 00 88 ff ff  .S.4.....S.4....
    ef 0a 0a 14 01 02 03 04 00 00 00 00 01 00 00 00  ................
  backtrace:
    [<ffffffff815c1b9e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811ea6e0>] kmem_cache_alloc+0x190/0x300
    [<ffffffff815931cb>] ip_mroute_setsockopt+0x5cb/0x910
    [<ffffffff8153d575>] do_ip_setsockopt.isra.11+0x105/0xff0
    [<ffffffff8153e490>] ip_setsockopt+0x30/0xa0
    [<ffffffff81564e13>] raw_setsockopt+0x33/0x90
    [<ffffffff814d1e14>] sock_common_setsockopt+0x14/0x20
    [<ffffffff814d0b51>] SyS_setsockopt+0x71/0xc0
    [<ffffffff815cdbf6>] entry_SYSCALL_64_fastpath+0x16/0x7a
    [<ffffffffffffffff>] 0xffffffffffffffff

Make sure that everything is cleaned on netns destruction.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-22 20:44:46 -05:00
Daniel Borkmann 6900317f5e net, scm: fix PaX detected msg_controllen overflow in scm_detach_fds
David and HacKurx reported a following/similar size overflow triggered
in a grsecurity kernel, thanks to PaX's gcc size overflow plugin:

(Already fixed in later grsecurity versions by Brad and PaX Team.)

[ 1002.296137] PAX: size overflow detected in function scm_detach_fds net/core/scm.c:314
               cicus.202_127 min, count: 4, decl: msg_controllen; num: 0; context: msghdr;
[ 1002.296145] CPU: 0 PID: 3685 Comm: scm_rights_recv Not tainted 4.2.3-grsec+ #7
[ 1002.296149] Hardware name: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, [...]
[ 1002.296153]  ffffffff81c27366 0000000000000000 ffffffff81c27375 ffffc90007843aa8
[ 1002.296162]  ffffffff818129ba 0000000000000000 ffffffff81c27366 ffffc90007843ad8
[ 1002.296169]  ffffffff8121f838 fffffffffffffffc fffffffffffffffc ffffc90007843e60
[ 1002.296176] Call Trace:
[ 1002.296190]  [<ffffffff818129ba>] dump_stack+0x45/0x57
[ 1002.296200]  [<ffffffff8121f838>] report_size_overflow+0x38/0x60
[ 1002.296209]  [<ffffffff816a979e>] scm_detach_fds+0x2ce/0x300
[ 1002.296220]  [<ffffffff81791899>] unix_stream_read_generic+0x609/0x930
[ 1002.296228]  [<ffffffff81791c9f>] unix_stream_recvmsg+0x4f/0x60
[ 1002.296236]  [<ffffffff8178dc00>] ? unix_set_peek_off+0x50/0x50
[ 1002.296243]  [<ffffffff8168fac7>] sock_recvmsg+0x47/0x60
[ 1002.296248]  [<ffffffff81691522>] ___sys_recvmsg+0xe2/0x1e0
[ 1002.296257]  [<ffffffff81693496>] __sys_recvmsg+0x46/0x80
[ 1002.296263]  [<ffffffff816934fc>] SyS_recvmsg+0x2c/0x40
[ 1002.296271]  [<ffffffff8181a3ab>] entry_SYSCALL_64_fastpath+0x12/0x85

Further investigation showed that this can happen when an *odd* number of
fds are being passed over AF_UNIX sockets.

In these cases CMSG_LEN(i * sizeof(int)) and CMSG_SPACE(i * sizeof(int)),
where i is the number of successfully passed fds, differ by 4 bytes due
to the extra CMSG_ALIGN() padding in CMSG_SPACE() to an 8 byte boundary
on 64 bit. The padding is used to align subsequent cmsg headers in the
control buffer.

When the control buffer passed in from the receiver side *lacks* these 4
bytes (e.g. due to buggy/wrong API usage), then msg->msg_controllen will
overflow in scm_detach_fds():

  int cmlen = CMSG_LEN(i * sizeof(int));  <--- cmlen w/o tail-padding
  err = put_user(SOL_SOCKET, &cm->cmsg_level);
  if (!err)
    err = put_user(SCM_RIGHTS, &cm->cmsg_type);
  if (!err)
    err = put_user(cmlen, &cm->cmsg_len);
  if (!err) {
    cmlen = CMSG_SPACE(i * sizeof(int));  <--- cmlen w/ 4 byte extra tail-padding
    msg->msg_control += cmlen;
    msg->msg_controllen -= cmlen;         <--- iff no tail-padding space here ...
  }                                            ... wrap-around

F.e. it will wrap to a length of 18446744073709551612 bytes in case the
receiver passed in msg->msg_controllen of 20 bytes, and the sender
properly transferred 1 fd to the receiver, so that its CMSG_LEN results
in 20 bytes and CMSG_SPACE in 24 bytes.

In case of MSG_CMSG_COMPAT (scm_detach_fds_compat()), I haven't seen an
issue in my tests as alignment seems always on 4 byte boundary. Same
should be in case of native 32 bit, where we end up with 4 byte boundaries
as well.

In practice, passing msg->msg_controllen of 20 to recvmsg() while receiving
a single fd would mean that on successful return, msg->msg_controllen is
being set by the kernel to 24 bytes instead, thus more than the input
buffer advertised. It could f.e. become an issue if such application later
on zeroes or copies the control buffer based on the returned msg->msg_controllen
elsewhere.

Maximum number of fds we can send is a hard upper limit SCM_MAX_FD (253).

Going over the code, it seems like msg->msg_controllen is not being read
after scm_detach_fds() in scm_recv() anymore by the kernel, good!

Relevant recvmsg() handler are unix_dgram_recvmsg() (unix_seqpacket_recvmsg())
and unix_stream_recvmsg(). Both return back to their recvmsg() caller,
and ___sys_recvmsg() places the updated length, that is, new msg_control -
old msg_control pointer into msg->msg_controllen (hence the 24 bytes seen
in the example).

Long time ago, Wei Yongjun fixed something related in commit 1ac70e7ad2
("[NET]: Fix function put_cmsg() which may cause usr application memory
overflow").

RFC3542, section 20.2. says:

  The fields shown as "XX" are possible padding, between the cmsghdr
  structure and the data, and between the data and the next cmsghdr
  structure, if required by the implementation. While sending an
  application may or may not include padding at the end of last
  ancillary data in msg_controllen and implementations must accept both
  as valid. On receiving a portable application must provide space for
  padding at the end of the last ancillary data as implementations may
  copy out the padding at the end of the control message buffer and
  include it in the received msg_controllen. When recvmsg() is called
  if msg_controllen is too small for all the ancillary data items
  including any trailing padding after the last item an implementation
  may set MSG_CTRUNC.

Since we didn't place MSG_CTRUNC for already quite a long time, just do
the same as in 1ac70e7ad2 to avoid an overflow.

Btw, even man-page author got this wrong :/ See db939c9b26e9 ("cmsg.3: Fix
error in SCM_RIGHTS code sample"). Some people must have copied this (?),
thus it got triggered in the wild (reported several times during boot by
David and HacKurx).

No Fixes tag this time as pre 2002 (that is, pre history tree).

Reported-by: David Sterba <dave@jikos.cz>
Reported-by: HacKurx <hackurx@gmail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: Eric Dumazet <edumazet@google.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-22 20:34:58 -05:00
David Ahern b811580d91 net: IPv6 fib lookup tracepoint
Add tracepoint to show fib6 table lookups and result.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-22 11:54:10 -05:00
Marcel Holtmann 40b25fe5dc Bluetooth: Add support for Get Advertising Size Information command
The Get Advertising Size Information command allows to retrieve size
information for advertising data and scan response data fields depending
on the selected flags. This is useful if applications want to know the
available size ahead of time.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-22 16:26:05 +02:00
Marcel Holtmann 31a3248dd9 Bluetooth: Simplify if statements in tlv_data_is_valid function
The if statements for checking the flags parameter could be written a
bit easier to read. This changes this. No functional behavior has been
changed.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-22 16:00:17 +03:00
Marcel Holtmann ceff86af56 Bluetooth: Add instance range check for Add Advertising command
The instance range check for Add Advertising command is missing. If the
provided instance is out of range an Invalid Parameters error should be
returned. At the moment, the generic Failed error is returned. This
extra check ensures that clear error messages are returned.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-22 16:00:16 +03:00
Eric Dumazet e2f9dc3bd2 net: avoid NULL deref in napi_get_frags()
napi_alloc_skb() can return NULL.
We should not crash should this happen.

Fixes: 93f93a4404 ("net: move skb_mark_napi_id() into core networking stack")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 16:43:14 -05:00
Jon Paul Maloy 9a65083827 tipc: correct settings of broadcast link state
Since commit 5266698661 ("tipc: let broadcast packet
reception use new link receive function") the broadcast send
link state was meant to always be set to LINK_ESTABLISHED, since
we don't need this link to follow the regular link FSM rules. It
was also the intention that this state anyway shouldn't impact
the run-time working state of the link, since the latter in
reality is controlled by the number of registered peers.

We have now discovered that this assumption is not quite correct.
If the broadcast link is reset because of too many retransmissions,
its state will inadvertently go to LINK_RESETTING, and never go
back to LINK_ESTABLISHED, because the LINK_FAILURE event was not
anticipated. This will work well once, but if it happens a second
time, the reset on a link in LINK_RESETTING has has no effect, and
neither the broadcast link nor the unicast links will go down as
they should.

Furthermore, it is confusing that the management tool shows that
this link is in UP state when that obviously isn't the case.

We now ensure that this state strictly follows the true working
state of the link. The state is set to LINK_ESTABLISHED when
the number of peers is non-zero, and to LINK_RESET otherwise.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:08:51 -05:00
Jon Paul Maloy 1a90632da8 tipc: eliminate remnants of hungarian notation
The number of variables with Hungarian notation (l_ptr, n_ptr etc.)
has been significantly reduced over the last couple of years.

We now root out the last traces of this practice.
There are no functional changes in this commit.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy 38206d5939 tipc: narrow down interface towards struct tipc_link
We move the definition of struct tipc_link from link.h to link.c in
order to minimize its exposure to the rest of the code.

When needed, we define new functions to make it possible for external
entities to access and set data in the link.

Apart from the above, there are no functional changes.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy 5be9c08671 tipc: narrow down exposure of struct tipc_node
In our effort to have less code and include dependencies between
entities such as node, link and bearer, we try to narrow down
the exposed interface towards the node as much as possible.

In this commit, we move the definition of struct tipc_node, along
with many of its associated function declarations, from node.h to
node.c. We also move some function definitions from link.c and
name_distr.c to node.c, since they access fields in struct tipc_node
that should not be externally visible. The moved functions are renamed
according to new location, and made static whenever possible.

There are no functional changes in this commit.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy 5405ff6e15 tipc: convert node lock to rwlock
According to the node FSM a node in state SELF_UP_PEER_UP cannot
change state inside a lock context, except when a TUNNEL_PROTOCOL
(SYNCH or FAILOVER) packet arrives. However, the node's individual
links may still change state.

Since each link now is protected by its own spinlock, we finally have
the conditions in place to convert the node spinlock to an rwlock_t.
If the node state and arriving packet type are rigth, we can let the
link directly receive the packet under protection of its own spinlock
and the node lock in read mode. In all other cases we use the node
lock in write mode. This enables full concurrent execution between
parallel links during steady-state traffic situations, i.e., 99+ %
of the time.

This commit implements this change.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy 2312bf61ae tipc: introduce per-link spinlock
As a preparation to allow parallel links to work more independently
from each other we introduce a per-link spinlock, to be stored in the
struct nodes's link entry area. Since the node lock still is a regular
spinlock there is no increase in parallellism at this stage.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy 1d7e1c2595 tipc: reduce code dependency between binding table and node layer
The file name_distr.c currently contains three functions,
named_cluster_distribute(), tipc_publ_subcscribe() and
tipc_publ_unsubscribe() that all directly access fields in
struct tipc_node. We want to eliminate such dependencies, so
we move those functions to the file node.c and rename them to
tipc_node_broadcast(), tipc_node_subscribe() and tipc_node_unsubscribe()
respectively.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy 5c10e97940 tipc: small cleanup of function tipc_node_check_state()
The function tipc_node_check_state() contains the core logics
for handling link synchronization and failover. For this reason,
it is important to keep it as comprehensible as possible.

In this commit, we make three small cleanups.

1) If the node is in state SELF_DOWN_PEER_LEAVING and the received
   packet confirms that the peer has lost contact, there will be no
   further action in this function. To make this clearer, we return
   from the function directly after the state change.

2) Since commit 0f8b8e28fb ("tipc: eliminate risk of stalled
   link synchronization") only the logically first TUNNEL_PROTO/SYNCH
   packet can alter the link state and set the synch point,
   independently of arrival order. Hence, there is not any longer any
   need to adjust the synch value in case such packets arrive in
   disorder. We remove this adjustment.

3) It is the intention that any message arriving on any of the links
   may trig a check for and possible termination of a node SYNCH state.
   A redundant and unnoticed check for tipc_link_is_synching() obviously
   beats this purpose, with the effect that only packets arriving on the
   synching link may currently end the synch state. We remove this check.
   This change will further shorten the synchronization period between
   parallel links.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:10 -05:00
Jon Paul Maloy c7cad0d6f7 tipc: move linearization of buffers to generic code
In commit 5cbb28a4bf ("tipc: linearize arriving NAME_DISTR
and LINK_PROTO buffers") we added linearization of NAME_DISTRIBUTOR,
LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE to the function
tipc_udp_recv(). The location of the change was selected in order
to make the commit easily appliable to 'net' and 'stable'.

We now move this linearization to where it should be done, in the
functions tipc_named_rcv() and tipc_link_proto_rcv() respectively.

Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 14:06:09 -05:00
Eric Dumazet 5d4c9bfbab tcp: fix potential huge kmalloc() calls in TCP_REPAIR
tcp_send_rcvq() is used for re-injecting data into tcp receive queue.

Problems :

- No check against size is performed, allowed user to fool kernel in
  attempting very large memory allocations, eventually triggering
  OOM when memory is fragmented.

- In case of fault during the copy we do not return correct errno.

Lets use alloc_skb_with_frags() to cook optimal skbs.

Fixes: 292e8d8c85 ("tcp: Move rcvq sending to tcp_input.c")
Fixes: c0e88ff0f2 ("tcp: Repair socket queues")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 10:57:33 -05:00
Yuchung Cheng dd52bc2b4e tcp: fix Fast Open snmp over-counting bug
Fix incrementing TCPFastOpenActiveFailed snmp stats multiple times
when the handshake experiences multiple SYN timeouts.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 10:51:12 -05:00
Yuchung Cheng 0e45f4da59 tcp: disable Fast Open on timeouts after handshake
Some middle-boxes black-hole the data after the Fast Open handshake
(https://www.ietf.org/proceedings/94/slides/slides-94-tcpm-13.pdf).
The exact reason is unknown. The work-around is to disable Fast Open
temporarily after multiple recurring timeouts with few or no data
delivered in the established state.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-20 10:51:12 -05:00
Gregory Greenman ac0621971a mac80211: always set the buf_size in AddBA req to 64
Advertising reordering window in ADDBA less than 64 can crash some APs,
an example is LinkSys WRT120N (with FW v1.0.07 build 002 Jun 18 2012).
On the other hand, a driver may need to limit Tx A-MPDU size for its own
reasons, like specific HW limitations.

Signed-off-by: Gregory Greenman <gregory.greenman@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-20 11:39:40 +01:00
Emmanuel Grumbach 5ad11b50fd mac80211: ensure we don't update tx power on a non-running sdata
We can't update the Tx power on the device unless it is
running.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=101521.

Cc: stable@vger.kernel.org
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-20 11:38:50 +01:00
Prasanna Karthik 9a54421018 Bluetooth: remove unneeded variable in l2cap_stream_rx
Remove unneeded variable used to store return value.
Error reported by coccicheck.

Signed-off-by: Prasanna Karthik <mkarthi3@visteon.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:33 +01:00
Prasanna Karthik 74b93e9f4e Bluetooth: Clean up hci_core code
Fix errors reported by checkpatch.
- ERROR: spaces required around that ':' (ctx:VxW)
- ERROR: open brace '{' following function declarations go on the next line

Signed-off-by: Prasanna Karthik <mkarthi3@visteon.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:33 +01:00
Prasanna Karthik f37590bd77 Bluetooth: clean up af_bluetooth code
Fix error reported by checkpatch.
ERROR:"foo* bar" should be "foo *bar"

Signed-off-by: Prasanna Karthik <mkarthi3@visteon.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:32 +01:00
Markus Elfring 06fbb3d5c7 Bluetooth: Delete an unnecessary check before the function call "kfree_skb"
The kfree_skb() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:32 +01:00
Markus Elfring 56f9ebe641 mac802154: Delete an unnecessary check before the function call "kfree_skb"
The kfree_skb() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:32 +01:00
Johan Hedberg 7df0f73ece Bluetooth: Simplify request cleanup code
The hci_req_sync_cancel() is just as much related to the request
cleanup as hci_request_cancel_all() is. Just move the former into the
latter and do the cleanup from a single place in hci_dev_do_close().
The important thing is to avoid deadlocks by holding the req_sync
lock: previously hci_request_cancel_all was done right after releasing
the lock and with this patch it's right before taking it.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:32 +01:00
Johan Hedberg e2caced407 Bluetooth: Remove conn_unfinished variable from hci_connect_le()
The conn_unfinished variable makes the entire logic of
hci_connect_le() rather confusing. By restructuring and clarifying the
logic we can actually remove the conn_unfinished variable and still
keep the same behavior.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:32 +01:00
Johan Hedberg 658aead94b Bluetooth: Move check for ongoing connect earlier in hci_connect_le()
This helps simplify the logic in further patches (less cleanups to do
in this failure branch).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:32 +01:00
Johan Hedberg 0ad06aa6a7 Bluetooth: Fix specifying role for LE connections
The hci_connect_le_scan() is (as the name implies) a master/central
role API, so it makes no sense in passing a role parameter to it. At
the same time this patch also fixes the direct advertising support for
LE L2CAP sockets where we now call the more appropriate hci_le_connect()
API if slave/peripheral role is desired.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:32 +01:00
Johan Hedberg 3dfe5905a7 Bluetooth: Remove unnecessary le_scan_restart_work_complete() function
The only user of this, le_scan_restart_work(), is so short and simple
that it makes sense to just merge the code there.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:31 +01:00
Johan Hedberg f4a2cb4d8f Bluetooth: Simplify le_scan_disable_work()
Merge le_scan_disable_work_complete into the main le_scan_disable_work
function and take advantage of the updated bredr_inquiry() to run the
Inquiry through hci_req_sync().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:31 +01:00
Johan Hedberg 7df26b5629 Bluetooth: Pass inquiry length to bredr_inquiry()
Passing the needed inquiry length to bredr_inquiry() makes it possible
to also use this helper for interleaved discovery where the controller
doesn't support simultaneous Inquiry & LE scan.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:31 +01:00
Johan Hedberg 2f27498107 Bluetooth: Fix BR/EDR Page Scan update with Add Device
The recent changes to remove dependency on HCI in Add Device missed
out relevant changes for BR/EDR. This patch removes the left-overs and
ensures the right HCI command gets queued for BR/EDR.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:31 +01:00
Johan Hedberg 2154d3f4fb Bluetooth: Move Stop Discovery to req_workqueue
Since discovery also deals with LE scanning it makes sense to move it
behind the same req_workqueue as other LE scanning changes. This also
simplifies the logic since we do many of the actions in a synchronous
manner.

Part of this refactoring is moving hci_req_stop_discovery() to
hci_request.c. At the same time the function receives support for
properly handling the STOPPING state since that's the state we'll be
in when stopping through the req_workqueue.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:31 +01:00
Johan Hedberg e68f072b73 Bluetooth: Move Start Discovery to req_workqueue
Since discovery also deals with LE scanning it makes sense to move it
behind the same req_workqueue as other LE scanning changes. This also
simplifies the logic since we do many of the actions in a synchronous
manner.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:30 +01:00
Johan Hedberg a1d01db120 Bluetooth: Add error return value to hci_req_sync callback
In some circumstances it may be useful to abort the request through
checks done in the request callback. To make the feature possible this
patch changes the return value of the request callback from void to
int and aborts the request if a non-zero value is returned.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:30 +01:00
Johan Hedberg 591752afbc Bluetooth: Add discovery type validity helper
As preparation for moving the discovery HCI commands behind
req_workqueue, add a helper and do the validity checks of the given
discovery type before proceeding further. This way we don't need to do
them again in hci_request.c.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:30 +01:00
Johan Hedberg 7c1fbed239 Bluetooth: Move LE scan disable/restart behind req_workqueue
To avoid any risks of races, place also these LE scan modification
work callbacks behind the same work queue as the other LE scan
changes.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:30 +01:00
Johan Hedberg 145a0913ef Bluetooth: Make __hci_update_background_scan private to hci_request.c
There are no more external users so this API can be made private.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:30 +01:00
Johan Hedberg af02dd4469 Bluetooth: Use req_workqueue for background scanning when powering on
We can easily use the new req_workqueue based background scan update
for the power on case. This also removes the last external user of
__hci_update_background_scan().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:29 +01:00
Johan Hedberg 84235d222a Bluetooth: Use req_workqueue for explicit connect requests
Since explicit connect requests are also a sub-category of passive
scan updates, run them through the same workqueue as the other passive
scan changes.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:29 +01:00
Johan Hedberg 4ebeee2dff Bluetooth: Add HCI status return parameter to hci_req_sync()
In some cases it may be important to get the exact HCI status rather
than the converted HCI-to-errno value. Add an optional return
parameter to the hci_req_sync() API to allow for this. Since there are
no good HCI translation candidates for cancelation and timeout, use
the "unknown" status code for those cases.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:29 +01:00
Johan Hedberg 51d7a94d56 Bluetooth: Don't wait for HCI in Add/Remove Device
There's no point in waiting for HCI activity in Add/Remove Device
since the effects of these calls are long-lasting and we can anyway
not report up to the application all HCI failures.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:29 +01:00
Johan Hedberg 2e93e53b8f Bluetooth: Run all background scan updates through req_workqueue
Instead of firing off a simple async request queue all background scan
updates through req_workqueue and use hci_req_sync() there to ensure
that no two updates overlap with each other.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:29 +01:00
Johan Hedberg 5fc16cc4f3 Bluetooth: Add stubs for synchronous HCI request functionality
Prepare hci_request.c to have code for doing synchronous HCI requests,
such as LE scanning or advertising changes. The necessary work
callbacks will be set up in hci_request_setup() and cleaned up in
hci_request_cancel_all(). The former is used when an HCI device get
registered, and the latter each time it is powered off (or
unregistered).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:29 +01:00
Johan Hedberg b504430c86 Bluetooth: Add 'sync' specifier to synchronous request APIs
To make it clear which HCI request APIs target specifically
synchronous requests, add 'sync' to the API names.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:28 +01:00
Johan Hedberg be91cd0570 Bluetooth: Move synchronous request handling into hci_request.c
hci_request.c is a more natural place for the synchronous request
handling. Furthermore, we will soon need access to some of the
previously private-to-hci_core.c functions from hci_request.c.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:28 +01:00
Johan Hedberg 030e7f8141 Bluetooth: Remove unnecessary call to hci_update_background_scan
The hci_conn_params_clear_all() function is only called from
hci_unregister_dev() at which point it's completely futile to try to
do any LE scanning updates. Simply remove this unnecessary function
call. At the same time we can make the function static since it's only
accessed from within the same c-file.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:28 +01:00
Marcel Holtmann ac71494934 Bluetooth: Add support for controller specific logging
To enable controller specific logging, the userspace daemon has to have
the ability to log per controller. To facilitate this support, provide
a dedicated logging channel. Messages in this channel will be included
in the monitor queue and with that also forwarded to monitoring tools
along with the actual hardware traces.

All messages from the logging channel are timestamped and with that
allow an easy correlation between userspace messages and hardware
events. This will increase the ability to debug problems faster.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-19 17:50:28 +01:00
Marcel Holtmann dd31506d4a Bluetooth: Add support for sending system notes to monitor channel
The monitor channel can be used to send generic system notes as text
strings for debugging purposes. This adds the system note monitor code
and uses it for including kernel and subsystem version into traces.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-19 17:50:28 +01:00
Marcel Holtmann 8528d3f738 Bluetooth: Fix casting coding style within HCI sockets
The HCI sockets code has still some old casting coding style. Fix this
to match with the rest of the code.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-19 17:50:28 +01:00
Johan Hedberg 0ebc181884 Bluetooth: Add clarifying comment why schedule_work is used
It's not obvious why schedule_work is used instead of queue_work. Add
a comment explaining why.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:27 +01:00
Marcel Holtmann 1982162bbe Bluetooth: Add missing hci_skb_opcode for raw socket commands
When HCI commands are injected via the raw socket, the core was not
including the decoded opcode value. So ensure that it is actually set.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-19 17:50:27 +01:00
Johan Hedberg 44d2713774 Bluetooth: Compress the size of struct hci_ctrl
We can reduce the size of the hci_ctrl struct by converting
'bool req_start' to 'u8 req_flags' and making the two function
pointers a union (since only one is ever set at a time).

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-19 17:50:27 +01:00
Marcel Holtmann d79f34e32b Bluetooth: Use new hci_skb_pkt_* wrappers for core packet handling
The new hci_skb_pkt_* wrappers only help if they are used consistently
in the Bluetooth subsystem. So first convert the core packet handling.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-19 17:50:27 +01:00
Marcel Holtmann 5c3d3b4c4f Bluetooth: Make LE only events conditional on supported commands
For the LE only controllers, there are events that should not be enabled
if the corresponding command is not supported.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-19 17:50:26 +01:00
Marcel Holtmann 9fe759ceed Bluetooth: Fix issue with HCI_QUIRK_FIXUP_INQUIRY_MODE and event mask
When setting the event mask, the HCI_QUIRK_FIXUP_INQUIRY_MODE quirk is
required to be checked so that the Inquiry Result with RSSI event gets
actually enabled.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-19 17:50:26 +01:00
Marcel Holtmann 7d26f5c4be Bluetooth: Build LE event mask based on supported commands
The LE event mask should be created based on the commands that are
actually supported by the controller.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-19 17:50:26 +01:00
Marcel Holtmann 70f56aa2ee Bluetooth: Move BR/EDR default events behind its features
There are some BR/EDR default events for Bluetooth 1.2 or later
controllers that are not conditional on their features being present.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-19 17:50:26 +01:00
Eric Dumazet 1b8e6a01e1 tcp: md5: fix lockdep annotation
When a passive TCP is created, we eventually call tcp_md5_do_add()
with sk pointing to the child. It is not owner by the user yet (we
will add this socket into listener accept queue a bit later anyway)

But we do own the spinlock, so amend the lockdep annotation to avoid
following splat :

[ 8451.090932] net/ipv4/tcp_ipv4.c:923 suspicious rcu_dereference_protected() usage!
[ 8451.090932]
[ 8451.090932] other info that might help us debug this:
[ 8451.090932]
[ 8451.090934]
[ 8451.090934] rcu_scheduler_active = 1, debug_locks = 1
[ 8451.090936] 3 locks held by socket_sockopt_/214795:
[ 8451.090936]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff855c6ac1>] __netif_receive_skb_core+0x151/0xe90
[ 8451.090947]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff85618143>] ip_local_deliver_finish+0x43/0x2b0
[ 8451.090952]  #2:  (slock-AF_INET){+.-...}, at: [<ffffffff855acda5>] sk_clone_lock+0x1c5/0x500
[ 8451.090958]
[ 8451.090958] stack backtrace:
[ 8451.090960] CPU: 7 PID: 214795 Comm: socket_sockopt_

[ 8451.091215] Call Trace:
[ 8451.091216]  <IRQ>  [<ffffffff856fb29c>] dump_stack+0x55/0x76
[ 8451.091229]  [<ffffffff85123b5b>] lockdep_rcu_suspicious+0xeb/0x110
[ 8451.091235]  [<ffffffff8564544f>] tcp_md5_do_add+0x1bf/0x1e0
[ 8451.091239]  [<ffffffff85645751>] tcp_v4_syn_recv_sock+0x1f1/0x4c0
[ 8451.091242]  [<ffffffff85642b27>] ? tcp_v4_md5_hash_skb+0x167/0x190
[ 8451.091246]  [<ffffffff85647c78>] tcp_check_req+0x3c8/0x500
[ 8451.091249]  [<ffffffff856451ae>] ? tcp_v4_inbound_md5_hash+0x11e/0x190
[ 8451.091253]  [<ffffffff85647170>] tcp_v4_rcv+0x3c0/0x9f0
[ 8451.091256]  [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
[ 8451.091260]  [<ffffffff856181b6>] ip_local_deliver_finish+0xb6/0x2b0
[ 8451.091263]  [<ffffffff85618143>] ? ip_local_deliver_finish+0x43/0x2b0
[ 8451.091267]  [<ffffffff85618d38>] ip_local_deliver+0x48/0x80
[ 8451.091270]  [<ffffffff85618510>] ip_rcv_finish+0x160/0x700
[ 8451.091273]  [<ffffffff8561900e>] ip_rcv+0x29e/0x3d0
[ 8451.091277]  [<ffffffff855c74b7>] __netif_receive_skb_core+0xb47/0xe90

Fixes: a8afca0329 ("tcp: md5: protects md5sig_info with RCU")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 23:02:47 -05:00
Aya Mahfouz 451c2b5caf net: dns_resolver: convert time_t to time64_t
Changes the definition of the pointer _expiry from time_t to
time64_t. This is to handle the Y2038 problem where time_t
will overflow in the year 2038. The change is safe because
the kernel subsystems that call dns_query pass NULL.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Aya Mahfouz <mahfouz.saif.elyazal@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 16:27:46 -05:00
Paolo Abeni 206b49500d net/ip6_tunnel: fix dst leak
the commit cdf3464e6c ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel")
introduced percpu storage for ip6_tunnel dst cache, but while clearing
such cache it used raw_cpu_ptr to walk the per cpu entries, so cached
dst on non current cpu are not actually reset.

This patch replaces raw_cpu_ptr with per_cpu_ptr, properly cleaning
such storage.

Fixes: cdf3464e6c ("ipv6: Fix dst_entry refcnt bugs in ip6_tunnel")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 16:25:01 -05:00
Eric Dumazet 93d05d4a32 net: provide generic busy polling to all NAPI drivers
NAPI drivers no longer need to observe a particular protocol
to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)

napi_hash_add() and napi_hash_del() are automatically called
from core networking stack, respectively from
netif_napi_add() and netif_napi_del()

This patch depends on free_netdev() and netif_napi_del() being
called from process context, which seems to be the norm.

Drivers might still prefer to call napi_hash_del() on their
own, since they might combine all the rcu grace periods into
a single one, knowing their NAPI structures lifetime, while
core networking stack has no idea of a possible combining.

Once this patch proves to not bring serious regressions,
we will cleanup drivers to either remove napi_hash_del()
or provide appropriate rcu grace periods combining.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 16:17:42 -05:00
Eric Dumazet 34cbe27e81 net: napi_hash_del() returns a boolean status
napi_hash_del() will soon be used from both drivers (if they want)
or core networking stack.

Callers are responsibles to ensure an RCU grace period is respected
before freeing napi structure : napi_hash_del() can signal if
this RCU grace period is needed or not.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 16:17:42 -05:00
Eric Dumazet 6180d9de61 net: move napi_hash[] into read mostly section
We do not often add/delete a napi context.
Moving napi_hash[] into read mostly section avoids potential false sharing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 16:17:42 -05:00
Eric Dumazet d64b5e85bf net: add netif_tx_napi_add()
netif_tx_napi_add() is a variant of netif_napi_add()

It should be used by drivers that use a napi structure
to exclusively poll TX.

We do not want to add this kind of napi in napi_hash[] in following
patches, adding generic busy polling to all NAPI drivers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 16:17:41 -05:00
Eric Dumazet 93f93a4404 net: move skb_mark_napi_id() into core networking stack
We would like to automatically provide busy polling support
to all NAPI drivers, without them having to implement anything.

skb_mark_napi_id() can be called from napi_gro_receive() and
napi_get_frags().

Few drivers are still calling skb_mark_napi_id() because
they use netif_receive_skb(). They should eventually call
napi_gro_receive() instead. I will leave this to drivers
maintainers.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 16:17:41 -05:00
Eric Dumazet ce6aea93f7 net: network drivers no longer need to implement ndo_busy_poll()
Instead of having to implement complex ndo_busy_poll() method,
drivers can simply rely on NAPI poll logic.

Busy polling gains are mainly coming from polling itself,
not on exact details on how we poll the device.

ndo_busy_poll() if implemented can avoid touching
napi state, but it adds extra synchronization between
normal napi->poll() and busy poll handler, slowing down
the common path (non busy polling) with extra atomic operations.
In practice few drivers ever got busy poll because of the complexity.

We could go one step further, and make busy polling
available for all NAPI drivers, but this would require
that all netif_napi_del() calls are done in process context
so that we can call synchronize_rcu().
Full audit would be required.

Before this is done, a driver still needs to call :

- skb_mark_napi_id() for each skb provided to the stack.
- napi_hash_add() and napi_hash_del() to allocate a napi_id per napi struct.
- Make sure RCU grace period is respected after napi_hash_del() before
  memory containing napi structure is freed.

Followup patch implements busy poll for mlx5 driver as an example.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 16:17:39 -05:00
Eric Dumazet 2a028ecb76 net: allow BH servicing in sk_busy_loop()
Instead of blocking BH in whole sk_busy_loop(), block them
only around ->ndo_busy_poll() calls.

This has many benefits.

1) allow tunneled traffic to use busy poll as well as native traffic.
   Tunnels handlers usually call netif_rx() and depend on net_rx_action()
   being run (from sofirq handler)

2) allow RFS/RPS being used (sending IPI to other cpus if needed)

3) use the 'lets burn cpu cycles' budget to do useful work
   (like TX completions, timers, RCU callbacks...)

4) reduce BH latencies, making busy poll a better citizen.

Tested:

Tested with SIT tunnel

lpaa5:~# echo 0 >/proc/sys/net/core/busy_read
lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    37373.93
16384  87380

Now enable busy poll on both hosts

lpaa5:~# echo 70 >/proc/sys/net/core/busy_read
lpaa6:~# echo 70 >/proc/sys/net/core/busy_read

lpaa5:~# ./netperf -H 2002:af6:786::1 -t TCP_RR
MIGRATED TCP REQUEST/RESPONSE TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:786::1 () port 0 AF_INET6 : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    58314.77
16384  87380

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 16:17:38 -05:00
Eric Dumazet 02d62e86fe net: un-inline sk_busy_loop()
There is really little gain from inlining this big function.
We'll soon make it even bigger in following patches.

This means we no longer need to export napi_by_id()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 16:17:38 -05:00
Eric Dumazet 52bd2d62ce net: better skb->sender_cpu and skb->napi_id cohabitation
skb->sender_cpu and skb->napi_id share a common storage,
and we had various bugs about this.

We had to call skb_sender_cpu_clear() in some places to
not leave a prior skb->napi_id and fool netdev_pick_tx()

As suggested by Alexei, we could split the space so that
these errors can not happen.

0 value being reserved as the common (not initialized) value,
let's reserve [1 .. NR_CPUS] range for valid sender_cpu,
and [NR_CPUS+1 .. ~0U] for valid napi_id.

This will allow proper busy polling support over tunnels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 16:17:37 -05:00
stephen hemminger 945fae44d3 udp: remove duplicate include
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 14:58:02 -05:00
Bastian Stender 09605cc12c net ipv4: use preferred log methods
Replace printk calls with preferred unconditional log method calls to keep
kernel messages clean.

Added newline to "too small MTU" message.

Signed-off-by: Bastian Stender <bst@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-18 13:37:20 -05:00
Linus Torvalds 7f151f1d8a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix list tests in netfilter ingress support, from Florian Westphal.

 2) Fix reversal of input and output interfaces in ingress hook
    invocation, from Pablo Neira Ayuso.

 3) We have a use after free in r8169, caught by Dave Jones, fixed by
    Francois Romieu.

 4) Splice use-after-free fix in AF_UNIX frmo Hannes Frederic Sowa.

 5) Three ipv6 route handling bug fixes from Martin KaFai Lau:
    a) Don't create clone routes not managed by the fib6 tree
    b) Don't forget to check expiration of DST_NOCACHE routes.
    c) Handle rt->dst.from == NULL properly.

 6) Several AF_PACKET fixes wrt transport header setting and SKB
    protocol setting, from Daniel Borkmann.

 7) Fix thunder driver crash on shutdown, from Pavel Fedin.

 8) Several Mellanox driver fixes (max MTU calculations, use of correct
    DMA unmap in TX path, etc.) from Saeed Mahameed, Tariq Toukan, Doron
    Tsur, Achiad Shochat, Eran Ben Elisha, and Noa Osherovich.

 9) Several mv88e6060 DSA driver fixes (wrong bit definitions for
    certain registers, etc.) from Neil Armstrong.

10) Make sure to disable preemption while updating per-cpu stats of ip
    tunnels, from Jason A.  Donenfeld.

11) Various ARM64 bpf JIT fixes, from Yang Shi.

12) Flush icache properly in ARM JITs, from Daniel Borkmann.

13) Fix masking of RX and TX interrupts in ravb driver, from Masaru
    Nagai.

14) Fix netdev feature propagation for devices not implementing
    ->ndo_set_features().  From Nikolay Aleksandrov.

15) Big endian fix in vmxnet3 driver, from Shrikrishna Khare.

16) RAW socket code increments incorrect SNMP counters, fix from Ben
    Cartwright-Cox.

17) IPv6 multicast SNMP counters are bumped twice, fix from Neil Horman.

18) Fix handling of VLAN headers on stacked devices when REORDER is
    disabled.  From Vlad Yasevich.

19) Fix SKB leaks and use-after-free in ipvlan and macvlan drivers, from
    Sabrina Dubroca.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (83 commits)
  MAINTAINERS: Update Mellanox's Eth NIC driver entries
  net/core: revert "net: fix __netdev_update_features return.." and add comment
  af_unix: take receive queue lock while appending new skb
  rtnetlink: fix frame size warning in rtnl_fill_ifinfo
  net: use skb_clone to avoid alloc_pages failure.
  packet: Use PAGE_ALIGNED macro
  packet: Don't check frames_per_block against negative values
  net: phy: Use interrupts when available in NOLINK state
  phy: marvell: Add support for 88E1540 PHY
  arm64: bpf: make BPF prologue and epilogue align with ARM64 AAPCS
  macvlan: fix leak in macvlan_handle_frame
  ipvlan: fix use after free of skb
  ipvlan: fix leak in ipvlan_rcv_frame
  vlan: Do not put vlan headers back on bridge and macvlan ports
  vlan: Fix untag operations of stacked vlans with REORDER_HEADER off
  via-velocity: unconditionally drop frames with bad l2 length
  ipg: Remove ipg driver
  dl2k: Add support for IP1000A-based cards
  snmp: Remove duplicate OUTMCAST stat increment
  net: thunder: Check for driver data in nicvf_remove()
  ...
2015-11-17 13:52:59 -08:00
Nikolay Aleksandrov 17b85d29e8 net/core: revert "net: fix __netdev_update_features return.." and add comment
This reverts commit 00ee592717 ("net: fix __netdev_update_features return
on ndo_set_features failure")
and adds a comment explaining why it's okay to return a value other than
0 upon error. Some drivers might actually change flags and return an
error so it's better to fire a spurious notification rather than miss
these.

CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17 15:25:45 -05:00
Hannes Frederic Sowa a3a116e04c af_unix: take receive queue lock while appending new skb
While possibly in future we don't necessarily need to use
sk_buff_head.lock this is a rather larger change, as it affects the
af_unix fd garbage collector, diag and socket cleanups. This is too much
for a stable patch.

For the time being grab sk_buff_head.lock without disabling bh and irqs,
so don't use locked skb_queue_tail.

Fixes: 869e7c6248 ("net: af_unix: implement stream sendpage support")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Reported-by: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17 15:25:45 -05:00
Hannes Frederic Sowa b22b941b2c rtnetlink: fix frame size warning in rtnl_fill_ifinfo
Fix the following warning:

  CC      net/core/rtnetlink.o
net/core/rtnetlink.c: In function ‘rtnl_fill_ifinfo’:
net/core/rtnetlink.c:1308:1: warning: the frame size of 2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]
 }
 ^
by splitting up the huge rtnl_fill_ifinfo into some smaller ones, so we
don't have the huge frame allocations at the same time.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17 15:25:44 -05:00
Martin Zhang 19125c1a4f net: use skb_clone to avoid alloc_pages failure.
1. new skb only need dst and ip address(v4 or v6).
2. skb_copy may need high order pages, which is very rare on long running server.

Signed-off-by: Junwei Zhang <linggao.zjw@alibaba-inc.com>
Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17 15:25:44 -05:00
Tobias Klauser 90836b67e2 packet: Use PAGE_ALIGNED macro
Use PAGE_ALIGNED(...) instead of open-coding it.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17 15:25:44 -05:00
Tobias Klauser 4194b4914a packet: Don't check frames_per_block against negative values
rb->frames_per_block is an unsigned int, thus can never be negative.

Also fix spacing in the calculation of frames_per_block.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17 15:25:44 -05:00
Vlad Yasevich 28f9ee22bc vlan: Do not put vlan headers back on bridge and macvlan ports
When a vlan is configured with REORDER_HEADER set to 0, the vlan
header is put back into the packet and makes it appear that
the vlan header is still there even after it's been processed.
This posses a problem for bridge and macvlan ports.  The packets
passed to those device may be forwarded and at the time of the
forward, vlan headers end up being unexpectedly present.

With the patch, we make sure that we do not put the vlan header
back (when REORDER_HEADER is 0) if a bridge or macvlan has
been configured on top of the vlan device.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17 14:38:35 -05:00
Vlad Yasevich a6e18ff111 vlan: Fix untag operations of stacked vlans with REORDER_HEADER off
When we have multiple stacked vlan devices all of which have
turned off REORDER_HEADER flag, the untag operation does not
locate the ethernet addresses correctly for nested vlans.
The reason is that in case of REORDER_HEADER flag being off,
the outer vlan headers are put back and the mac_len is adjusted
to account for the presense of the header.  Then, the subsequent
untag operation, for the next level vlan, always use VLAN_ETH_HLEN
to locate the begining of the ethernet header and that ends up
being a multiple of 4 bytes short of the actuall beginning
of the mac header (the multiple depending on the how many vlan
encapsulations ethere are).

As a reslult, if there are multiple levles of vlan devices
with REODER_HEADER being off, the recevied packets end up
being dropped.

To solve this, we use skb->mac_len as the offset.  The value
is always set on receive path and starts out as a ETH_HLEN.
The value is also updated when the vlan header manupations occur
so we know it will be correct.

Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-17 14:38:35 -05:00
Johannes Berg c2e703a552 mac80211: mesh: fix call_rcu() usage
When using call_rcu(), the called function may be delayed quite
significantly, and without a matching rcu_barrier() there's no
way to be sure it has finished.
Therefore, global state that could be gone/freed/reused should
never be touched in the callback.

Fix this in mesh by moving the atomic_dec() into the caller;
that's not really a problem since we already unlinked the path
and it will be destroyed anyway.

This fixes a crash Jouni observed when running certain tests in
a certain order, in which the mesh interface was torn down, the
memory reused for a function pointer (work struct) and running
that then crashed since the pointer had been decremented by 1,
resulting in an invalid instruction byte stream.

Cc: stable@vger.kernel.org
Fixes: eb2b9311fd ("mac80211: mesh path table implementation")
Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-17 15:49:25 +01:00
Johannes Berg 45bb780a21 mac80211: don't advertise NL80211_FEATURE_FULL_AP_CLIENT_STATE
For now, this feature doesn't actually work. To avoid shipping a
kernel that has it enabled but where it can't be used disable it
for now - we can re-enable it when it's fixed.

This partially reverts 44674d9c22 ("mac80211: advertise support
for full station state in AP mode").

Cc: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-17 15:48:50 +01:00
Neil Horman 41033f029e snmp: Remove duplicate OUTMCAST stat increment
the OUTMCAST stat is double incremented, getting bumped once in the mcast code
itself, and again in the common ip output path.  Remove the mcast bump, as its
not needed

Validated by the reporter, with good results

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Claus Jensen <claus.jensen@microsemi.com>
CC: Claus Jensen <claus.jensen@microsemi.com>
CC: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16 16:36:32 -05:00
Bjørn Mork 88ad4175b2 net/core: use netdev name in warning if no parent
A recent flaw in the netdev feature setting resulted in warnings
like this one from VLAN interfaces:

 WARNING: CPU: 1 PID: 4975 at net/core/dev.c:2419 skb_warn_bad_offload+0xbc/0xcb()
 : caps=(0x00000000001b5820, 0x00000000001b5829) len=2782 data_len=0 gso_size=1348 gso_type=16 ip_summed=3

The ":" is supposed to be preceded by a driver name, but in this
case it is an empty string since the device has no parent.

There are many types of network devices without a parent. The
anonymous warnings for these devices can be hard to debug.  Log
the network device name instead in these cases to assist further
debugging.

This is mostly similar to how __netdev_printk() handles orphan
devices.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16 16:21:48 -05:00
Hannes Frederic Sowa 8844f97238 af_unix: don't append consumed skbs to sk_receive_queue
In case multiple writes to a unix stream socket race we could end up in a
situation where we pre-allocate a new skb for use in unix_stream_sendpage
but have to free it again in the locked section because another skb
has been appended meanwhile, which we must use. Accidentally we didn't
clear the pointer after consuming it and so we touched freed memory
while appending it to the sk_receive_queue. So, clear the pointer after
consuming the skb.

This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.

Fixes: 869e7c6248 ("net: af_unix: implement stream sendpage support")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16 15:39:35 -05:00
Ben Cartwright-Cox 027ac58e3c raw: increment correct SNMP counters for ICMP messages
Sending ICMP packets with raw sockets ends up in the SNMP counters
logging the type as the first byte of the IPv4 header rather than
the ICMP header. This is fixed by adding the IP Header Length to
the casting into a icmphdr struct.

Signed-off-by: Ben Cartwright-Cox <ben@benjojo.co.uk>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16 15:08:48 -05:00
Nikolay Aleksandrov 00ee592717 net: fix __netdev_update_features return on ndo_set_features failure
If ndo_set_features fails __netdev_update_features() will return -1 but
this is wrong because it is expected to return 0 if no features were
changed (see netdev_update_features()), which will cause a netdev
notifier to be called without any actual changes. Fix this by returning
0 if ndo_set_features fails.

Fixes: 6cb6a27c45 ("net: Call netdev_features_change() from netdev_update_features()")
CC: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16 14:56:03 -05:00
Nikolay Aleksandrov 5f8dc33e8e net: fix feature changes on devices without ndo_set_features
When __netdev_update_features() was updated to ensure some features are
disabled on new lower devices, an error was introduced for devices which
don't have the ndo_set_features() method set. Before we'll just set the
new features, but now we return an error and don't set them. Fix this by
returning the old behaviour and setting err to 0 when ndo_set_features
is not present.

Fixes: e7868a85e1 ("net/core: ensure features get disabled on new lower devs")
CC: Jarod Wilson <jarod@redhat.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Ido Schimmel <idosch@mellanox.com>
CC: Sander Eikelenboom <linux@eikelenboom.it>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Reviewed-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Dave Young <dyoung@redhat.com>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16 14:56:03 -05:00
Ido Schimmel bbe14f5429 switchdev: bridge: Check return code is not EOPNOTSUPP
When NET_SWITCHDEV=n, switchdev_port_attr_set simply returns EOPNOTSUPP.
In this case we should not emit errors and warnings to the kernel log.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 0bc05d585d ("switchdev: allow caller to explicitly request
attr_set as deferred")
Fixes: 6ac311ae8b ("Adding switchdev ageing notification on port
bridged")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-16 14:56:03 -05:00
Eric Dumazet 340c78e590 ipvs: use skb_to_full_sk() helper
SYNACK packets might be attached to request sockets.

Use skb_to_full_sk() helper to avoid illegal accesses to
inet_sk(skb->sk)

Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15 18:39:48 -05:00
Eric Dumazet 00fd38d938 tcp: ensure proper barriers in lockless contexts
Some functions access TCP sockets without holding a lock and
might output non consistent data, depending on compiler and or
architecture.

tcp_diag_get_info(), tcp_get_info(), tcp_poll(), get_tcp4_sock() ...

Introduce sk_state_load() and sk_state_store() to fix the issues,
and more clearly document where this lack of locking is happening.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15 18:36:38 -05:00
lucien ed5a377d87 sctp: translate host order to network order when setting a hmacid
now sctp auth cannot work well when setting a hmacid manually, which
is caused by that we didn't use the network order for hmacid, so fix
it by adding the transformation in sctp_auth_ep_set_hmacs.

even we set hmacid with the network order in userspace, it still
can't work, because of this condition in sctp_auth_ep_set_hmacs():

		if (id > SCTP_AUTH_HMAC_ID_MAX)
			return -EOPNOTSUPP;

so this wasn't working before and thus it won't break compatibility.

Fixes: 65b07e5d0d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15 18:27:27 -05:00
Daniel Borkmann 5cfb4c8d05 packet: fix tpacket_snd max frame len
Since it's introduction in commit 69e3c75f4d ("net: TX_RING and
packet mmap"), TX_RING could be used from SOCK_DGRAM and SOCK_RAW
side. When used with SOCK_DGRAM only, the size_max > dev->mtu +
reserve check should have reserve as 0, but currently, this is
unconditionally set (in it's original form as dev->hard_header_len).

I think this is not correct since tpacket_fill_skb() would then
take dev->mtu and dev->hard_header_len into account for SOCK_DGRAM,
the extra VLAN_HLEN could be possible in both cases. Presumably, the
reserve code was copied from packet_snd(), but later on missed the
check. Make it similar as we have it in packet_snd().

Fixes: 69e3c75f4d ("net: TX_RING and packet mmap")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15 18:00:35 -05:00
Daniel Borkmann c72219b75f packet: infer protocol from ethernet header if unset
In case no struct sockaddr_ll has been passed to packet
socket's sendmsg() when doing a TX_RING flush run, then
skb->protocol is set to po->num instead, which is the protocol
passed via socket(2)/bind(2).

Applications only xmitting can go the path of allocating the
socket as socket(PF_PACKET, <mode>, 0) and do a bind(2) on the
TX_RING with sll_protocol of 0. That way, register_prot_hook()
is neither called on creation nor on bind time, which saves
cycles when there's no interest in capturing anyway.

That leaves us however with po->num 0 instead and therefore
the TX_RING flush run sets skb->protocol to 0 as well. Eric
reported that this leads to problems when using tools like
trafgen over bonding device. I.e. the bonding's hash function
could invoke the kernel's flow dissector, which depends on
skb->protocol being properly set. In the current situation, all
the traffic is then directed to a single slave.

Fix it up by inferring skb->protocol from the Ethernet header
when not set and we have ARPHRD_ETHER device type. This is only
done in case of SOCK_RAW and where we have a dev->hard_header_len
length. In case of ARPHRD_ETHER devices, this is guaranteed to
cover ETH_HLEN, and therefore being accessed on the skb after
the skb_store_bits().

Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15 18:00:35 -05:00
Daniel Borkmann 3c70c13248 packet: only allow extra vlan len on ethernet devices
Packet sockets can be used by various net devices and are not
really restricted to ARPHRD_ETHER device types. However, when
currently checking for the extra 4 bytes that can be transmitted
in VLAN case, our assumption is that we generally probe on
ARPHRD_ETHER devices. Therefore, before looking into Ethernet
header, check the device type first.

This also fixes the issue where non-ARPHRD_ETHER devices could
have no dev->hard_header_len in TX_RING SOCK_RAW case, and thus
the check would test unfilled linear part of the skb (instead
of non-linear).

Fixes: 57f89bfa21 ("network: Allow af_packet to transmit +4 bytes for VLAN packets.")
Fixes: 52f1454f62 ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15 18:00:35 -05:00
Daniel Borkmann 8fd6c80d9d packet: always probe for transport header
We concluded that the skb_probe_transport_header() should better be
called unconditionally. Avoiding the call into the flow dissector has
also not really much to do with the direct xmit mode.

While it seems that only virtio_net code makes use of GSO from non
RX/TX ring packet socket paths, we should probe for a transport header
nevertheless before they hit devices.

Reference: http://thread.gmane.org/gmane.linux.network/386173/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15 18:00:35 -05:00
Daniel Borkmann efdfa2f784 packet: do skb_probe_transport_header when we actually have data
In tpacket_fill_skb() commit c1aad275b0 ("packet: set transport
header before doing xmit") and later on 40893fd0fd ("net: switch
to use skb_probe_transport_header()") was probing for a transport
header on the skb from a ring buffer slot, but at a time, where
the skb has _not even_ been filled with data yet. So that call into
the flow dissector is pretty useless. Lets do it after we've set
up the skb frags.

Fixes: c1aad275b0 ("packet: set transport header before doing xmit")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15 18:00:35 -05:00
Martin KaFai Lau 02bcf4e082 ipv6: Check rt->dst.from for the DST_NOCACHE route
All DST_NOCACHE rt6_info used to have rt->dst.from set to
its parent.

After commit 8e3d5be736 ("ipv6: Avoid double dst_free"),
DST_NOCACHE is also set to rt6_info which does not have
a parent (i.e. rt->dst.from is NULL).

This patch catches the rt->dst.from == NULL case.

Fixes: 8e3d5be736 ("ipv6: Avoid double dst_free")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15 17:12:37 -05:00
Martin KaFai Lau 5973fb1e24 ipv6: Check expire on DST_NOCACHE route
Since the expires of the DST_NOCACHE rt can be set during
the ip6_rt_update_pmtu(), we also need to consider the expires
value when doing ip6_dst_check().

This patches creates __rt6_check_expired() to only
check the expire value (if one exists) of the current rt.

In rt6_dst_from_check(), it adds __rt6_check_expired() as
one of the condition check.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15 17:12:37 -05:00
Martin KaFai Lau 0d3f6d297b ipv6: Avoid creating RTF_CACHE from a rt that is not managed by fib6 tree
The original bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1272571

The setup has a IPv4 GRE tunnel running in a IPSec.  The bug
happens when ndisc starts sending router solicitation at the gre
interface.  The simplified oops stack is like:

__lock_acquire+0x1b2/0x1c30
lock_acquire+0xb9/0x140
_raw_write_lock_bh+0x3f/0x50
__ip6_ins_rt+0x2e/0x60
ip6_ins_rt+0x49/0x50
~~~~~~~~
__ip6_rt_update_pmtu.part.54+0x145/0x250
ip6_rt_update_pmtu+0x2e/0x40
~~~~~~~~
ip_tunnel_xmit+0x1f1/0xf40
__gre_xmit+0x7a/0x90
ipgre_xmit+0x15a/0x220
dev_hard_start_xmit+0x2bd/0x480
__dev_queue_xmit+0x696/0x730
dev_queue_xmit+0x10/0x20
neigh_direct_output+0x11/0x20
ip6_finish_output2+0x21f/0x770
ip6_finish_output+0xa7/0x1d0
ip6_output+0x56/0x190
~~~~~~~~
ndisc_send_skb+0x1d9/0x400
ndisc_send_rs+0x88/0xc0
~~~~~~~~

The rt passed to ip6_rt_update_pmtu() is created by
icmp6_dst_alloc() and it is not managed by the fib6 tree,
so its rt6i_table == NULL.  When __ip6_rt_update_pmtu() creates
a RTF_CACHE clone, the newly created clone also has rt6i_table == NULL
and it causes the ip6_ins_rt() oops.

During pmtu update, we only want to create a RTF_CACHE clone
from a rt which is currently managed (or owned) by the
fib6 tree.  It means either rt->rt6i_node != NULL or
rt is a RTF_PCPU clone.

It is worth to note that rt6i_table may not be NULL even it is
not (yet) managed by the fib6 tree (e.g. addrconf_dst_alloc()).
Hence, rt6i_node is a better check instead of rt6i_table.

Fixes: 45e4fd2668 ("ipv6: Only create RTF_CACHE routes after encountering pmtu")
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reported-by: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
Cc: Chris Siebenmann <cks-rhbugzilla@cs.toronto.edu>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15 17:12:37 -05:00
Hannes Frederic Sowa 73ed5d25dc af-unix: fix use-after-free with concurrent readers while splicing
During splicing an af-unix socket to a pipe we have to drop all
af-unix socket locks. While doing so we allow another reader to enter
unix_stream_read_generic which can read, copy and finally free another
skb. If exactly this skb is just in process of being spliced we get a
use-after-free report by kasan.

First, we must make sure to not have a free while the skb is used during
the splice operation. We simply increment its use counter before unlocking
the reader lock.

Stream sockets have the nice characteristic that we don't care about
zero length writes and they never reach the peer socket's queue. That
said, we can take the UNIXCB.consumed field as the indicator if the
skb was already freed from the socket's receive queue. If the skb was
fully consumed after we locked the reader side again we know it has been
dropped by a second reader. We indicate a short read to user space and
abort the current splice operation.

This bug has been found with syzkaller
(http://github.com/google/syzkaller) by Dmitry Vyukov.

Fixes: 2b514574f7 ("net: af_unix: implement splice for stream af_unix sockets")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-15 13:16:34 -05:00
Linus Torvalds ca4ba96e02 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "There are several patches from Ilya fixing RBD allocation lifecycle
  issues, a series adding a nocephx_sign_messages option (and associated
  bug fixes/cleanups), several patches from Zheng improving the
  (directory) fsync behavior, a big improvement in IO for direct-io
  requests when striping is enabled from Caifeng, and several other
  small fixes and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: clear msg->con in ceph_msg_release() only
  libceph: add nocephx_sign_messages option
  libceph: stop duplicating client fields in messenger
  libceph: drop authorizer check from cephx msg signing routines
  libceph: msg signing callouts don't need con argument
  libceph: evaluate osd_req_op_data() arguments only once
  ceph: make fsync() wait unsafe requests that created/modified inode
  ceph: add request to i_unsafe_dirops when getting unsafe reply
  libceph: introduce ceph_x_authorizer_cleanup()
  ceph: don't invalidate page cache when inode is no longer used
  rbd: remove duplicate calls to rbd_dev_mapping_clear()
  rbd: set device_type::release instead of device::release
  rbd: don't free rbd_dev outside of the release callback
  rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails
  libceph: use local variable cursor instead of &msg->cursor
  libceph: remove con argument in handle_reply()
  ceph: combine as many iovec as possile into one OSD request
  ceph: fix message length computation
  ceph: fix a comment typo
  rbd: drop null test before destroy functions
2015-11-13 09:24:40 -08:00
David S. Miller 382a483e53 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree. This
large batch that includes fixes for ipset, netfilter ingress, nf_tables
dynamic set instantiation and a longstanding Kconfig dependency problem.
More specifically, they are:

1) Add missing check for empty hook list at the ingress hook, from
   Florian Westphal.

2) Input and output interface are swapped at the ingress hook,
   reported by Patrick McHardy.

3) Resolve ipset extension alignment issues on ARM, patch from Jozsef
   Kadlecsik.

4) Fix bit check on bitmap in ipset hash type, also from Jozsef.

5) Release buckets when all entries have expired in ipset hash type,
   again from Jozsef.

6) Oneliner to initialize conntrack tuple object in the PPTP helper,
   otherwise the conntrack lookup may fail due to random bits in the
   structure holes, patch from Anthony Lineham.

7) Silence a bogus gcc warning in nfnetlink_log, from Arnd Bergmann.

8) Fix Kconfig dependency problems with TPROXY, socket and dup, also
   from Arnd.

9) Add __netdev_alloc_pcpu_stats() to allow creating percpu counters
   from atomic context, this is required by the follow up fix for
   nf_tables.

10) Fix crash from the dynamic set expression, we have to add new clone
    operation that should be defined when a simple memcpy is not enough.
    This resolves a crash when using per-cpu counters with new Patrick
    McHardy's flow table nft support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-12 14:17:16 -05:00
Linus Torvalds 31c1febd7a Mainly smaller bugfixes and cleanup. We're still finding some bugs from
the breakup of the big NFSv4 state lock in 3.17--thanks especially to
 Andrew Elble and Jeff Layton for tracking down some of the remaining
 races.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWQ51oAAoJECebzXlCjuG+b1AP+wWamMNw8eS0N98+KslfTMNd
 BFcOFp6L5Hv1VRuwl67V9UUNS+9y5rsgWh9gMnQe5yPZ+dVABbO6mKfh3f7HJ2zg
 aGzmE9ZdTMejjRDpSHHMEqxYePlxGVgxVhr8CgnzgkXf6KBEy3emfDlFIocgFWdR
 JGWEhfOoOa+H7b3Awq3KxlxhAajq1ic14un2CLxTYdvjshwhlIjnscY1F7vNiNRg
 O/ELQRKCCSNZbwGSV/OhNUXx3VQPQUh1eMvIfSD3Fs6AtMybIWGW5Fc36jxZJKt2
 kllcEfukRnGa+Ezl/hwBWd1pEVMwmkYhNRt+9LtH8uWG4+uT5i3Mxn4taDYg8618
 plp2GtRoC3VwOvUKEcKl3HhlRBu5H4zJtx9x60NDzAgNUtoKG/Dl7rhm7o7QwUk8
 W3k0jYAryoyx+12fYO0dssdM4pj1Gi7nRGR687lKzXXttktbQF88ZS9EHhI3oFiC
 Ak8ilEeap8ND1KIJY6Z2xr925BPpw+P2GXbd/Mr5H0aX3a3WM3wLPXlToZbve5EP
 haYnTqbHw9QzqbLDcki8s0hNgv+xwlQWopoInGijfr3IAHQMpKStI0WxhyTYsb8g
 0xyRWA1COnj5WvgznbkMky7Q45T27q26EFgaS8+LEJ1rtEpmNDZOaycbwym6XQHk
 1oyydIRSWM3c7eWnDvFG
 =wRg0
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Apologies for coming a little late in the merge window.  Fortunately
  this is another fairly quiet one:

  Mainly smaller bugfixes and cleanup.  We're still finding some bugs
  from the breakup of the big NFSv4 state lock in 3.17 -- thanks
  especially to Andrew Elble and Jeff Layton for tracking down some of
  the remaining races"

* tag 'nfsd-4.4' of git://linux-nfs.org/~bfields/linux:
  svcrpc: document lack of some memory barriers
  nfsd: fix race with open / open upgrade stateids
  nfsd: eliminate sending duplicate and repeated delegations
  nfsd: remove recurring workqueue job to clean DRC
  SUNRPC: drop stale comment in svc_setup_socket()
  nfsd: ensure that seqid morphing operations are atomic wrt to copies
  nfsd: serialize layout stateid morphing operations
  nfsd: improve client_has_state to check for unused openowners
  nfsd: fix clid_inuse on mount with security change
  sunrpc/cache: make cache flushing more reliable.
  nfsd: move include of state.h from trace.c to trace.h
  sunrpc: avoid warning in gss_key_timeout
  lockd: get rid of reference-counted NSM RPC clients
  SUNRPC: Use MSG_SENDPAGE_NOTLAST when calling sendpage()
  lockd: create NSM handles per net namespace
  nfsd: switch unsigned char flags in svc_fh to bools
  nfsd: move svc_fh->fh_maxsize to just after fh_handle
  nfsd: drop null test before destroy functions
  nfsd: serialize state seqid morphing operations
2015-11-11 20:11:28 -08:00
Johan Hedberg 7883746bc6 Bluetooth: Fix l2cap_chan leak in SMP
The L2CAP core expects channel implementations to manage the reference
returned by the new_connection callback. With sockets this is already
handled with each channel being tied to the corresponding socket. With
SMP however there's no context to tie the pointer to in the
smp_new_conn_cb function. The function can also not just drop the
reference since it's the only one at that point.

For fixed channels (like SMP) the code path inside the L2CAP core from
new_connection() to ready() is short and straight-forwards. The
crucial difference is that in ready() the implementation has access to
the l2cap_conn that SMP needs associate its l2cap_chan. Instead of
taking a new reference in smp_ready_cb() we can simply assume to
already own the reference created in smp_new_conn_cb(), i.e. there is
no need to call l2cap_chan_hold().

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Cc: stable@vger.kernel.org # 3.19+
2015-11-11 23:48:34 +01:00
Linus Torvalds 2df4ee78d0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix null deref in xt_TEE netfilter module, from Eric Dumazet.

 2) Several spots need to get to the original listner for SYN-ACK
    packets, most spots got this ok but some were not.  Whilst covering
    the remaining cases, create a helper to do this.  From Eric Dumazet.

 3) Missiing check of return value from alloc_netdev() in CAIF SPI code,
    from Rasmus Villemoes.

 4) Don't sleep while != TASK_RUNNING in macvtap, from Vlad Yasevich.

 5) Use after free in mvneta driver, from Justin Maggard.

 6) Fix race on dst->flags access in dst_release(), from Eric Dumazet.

 7) Add missing ZLIB_INFLATE dependency for new qed driver.  From Arnd
    Bergmann.

 8) Fix multicast getsockopt deadlock, from WANG Cong.

 9) Fix deadlock in btusb, from Kuba Pawlak.

10) Some ipv6_add_dev() failure paths were not cleaning up the SNMP6
    counter state.  From Sabrina Dubroca.

11) Fix packet_bind() race, which can cause lost notifications, from
    Francesco Ruggeri.

12) Fix MAC restoration in qlcnic driver during bonding mode changes,
    from Jarod Wilson.

13) Revert bridging forward delay change which broke libvirt and other
    userspace things, from Vlad Yasevich.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
  Revert "bridge: Allow forward delay to be cfgd when STP enabled"
  bpf_trace: Make dependent on PERF_EVENTS
  qed: select ZLIB_INFLATE
  net: fix a race in dst_release()
  net: mvneta: Fix memory use after free.
  net: Documentation: Fix default value tcp_limit_output_bytes
  macvtap: Resolve possible __might_sleep warning in macvtap_do_read()
  mvneta: add FIXED_PHY dependency
  net: caif: check return value of alloc_netdev
  net: hisilicon: NET_VENDOR_HISILICON should depend on HAS_DMA
  drivers: net: xgene: fix RGMII 10/100Mb mode
  netfilter: nft_meta: use skb_to_full_sk() helper
  net_sched: em_meta: use skb_to_full_sk() helper
  sched: cls_flow: use skb_to_full_sk() helper
  netfilter: xt_owner: use skb_to_full_sk() helper
  smack: use skb_to_full_sk() helper
  net: add skb_to_full_sk() helper and use it in selinux_netlbl_skbuff_setsid()
  bpf: doc: correct arch list for supported eBPF JIT
  dwc_eth_qos: Delete an unnecessary check before the function call "of_node_put"
  bonding: fix panic on non-ARPHRD_ETHER enslave failure
  ...
2015-11-10 18:11:41 -08:00
Pablo Neira Ayuso 086f332167 netfilter: nf_tables: add clone interface to expression operations
With the conversion of the counter expressions to make it percpu, we
need to clone the percpu memory area, otherwise we crash when using
counters from flow tables.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-10 23:47:32 +01:00
Arnd Bergmann 74ec4d55c4 netfilter: fix xt_TEE and xt_TPROXY dependencies
Kconfig is too smart for its own good: a Kconfig line that states

	select NF_DEFRAG_IPV6 if IP6_NF_IPTABLES

means that if IP6_NF_IPTABLES is set to 'm', then NF_DEFRAG_IPV6 will
also be set to 'm', regardless of the state of the symbol from which
it is selected. When the xt_TEE driver is built-in and nothing else
forces NF_DEFRAG_IPV6 to be built-in, this causes a link-time error:

net/built-in.o: In function `tee_tg6':
net/netfilter/xt_TEE.c:46: undefined reference to `nf_dup_ipv6'

This works around that behavior by changing the dependency to
'if IP6_NF_IPTABLES != n', which is interpreted as boolean expression
rather than a tristate and causes the NF_DEFRAG_IPV6 symbol to
be built-in as well.

The bug only occurs once in thousands of 'randconfig' builds and
does not really impact real users. From inspecting the other
surrounding Kconfig symbols, I am guessing that NETFILTER_XT_TARGET_TPROXY
and NETFILTER_XT_MATCH_SOCKET have the same issue. If not, this
change should still be harmless.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-10 23:46:57 +01:00
Arnd Bergmann c872a2d9e3 netfilter: nfnetlink_log: work around uninitialized variable warning
After a recent (correct) change, gcc started warning about the use
of the 'flags' variable in nfulnl_recv_config()

net/netfilter/nfnetlink_log.c: In function 'nfulnl_recv_config':
net/netfilter/nfnetlink_log.c:320:14: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]
net/netfilter/nfnetlink_log.c:828:6: note: 'flags' was declared here

The warning first shows up in ARM s3c2410_defconfig with gcc-4.3 or
higher (including 5.2.1, which is the latest version I checked) I
tried working around it by rearranging the code but had no success
with that.

As a last resort, this initializes the variable to zero, which shuts
up the warning, but means that we don't get a warning if the code
is ever changed in a way that actually causes the variable to be
used without first being written.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 8cbc870829 ("netfilter: nfnetlink_log: validate dependencies to avoid breaking atomicity")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-10 23:46:36 +01:00
J. Bruce Fields 0442f14b15 svcrpc: document lack of some memory barriers
We're missing memory barriers in net/sunrpc/svcsock.c in some spots we'd
expect them.  But it doesn't appear they're necessary in our case, and
this is likely a hot path--for now just document the odd behavior.

Kosuke Tatsukawa found this issue while looking through the linux source
code for places calling waitqueue_active() before wake_up*(), but
without preceding memory barriers, after sending a patch to fix a
similar issue in drivers/tty/n_tty.c  (Details about the original issue
can be found here: https://lkml.org/lkml/2015/9/28/849).

Reported-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-10 17:02:47 -05:00
Vlad Yasevich 8a921265e2 Revert "bridge: Allow forward delay to be cfgd when STP enabled"
This reverts commit 34c2d9fb04.

There are 2 reasons for this revert:
 1)  The commit in question doesn't do what it says it does.  The
     description reads: "Allow bridge forward delay to be configured
     when Spanning Tree is enabled."  This was already the case before
     the commit was made.  What the commit actually do was disallow
     invalid values or 'forward_delay' when STP was turned off.

 2)  The above change was actually a change in the user observed
     behavior and broke things like libvirt and other network configs
     that set 'forward_delay' to 0 without enabling STP.  The value
     of 0 is actually used when STP is turned off to immediately mark
     the bridge as forwarding.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-10 15:41:36 -05:00
Stefan Hajnoczi ea833f5de3 SUNRPC: drop stale comment in svc_setup_socket()
The svc_setup_socket() function does set the send and receive buffer
sizes, so the comment is out-of-date:

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-11-10 09:25:50 -05:00
Linus Torvalds bd4f203e43 Merge branch 'akpm' (patches from Andrew)
Merge third patch-bomb from Andrew Morton:
 "We're pretty much done over here - I'm still waiting for a nouveau
  merge so I can cleanly finish up Christoph's dma-mapping rework.

   - bunch of small misc stuff

   - fold abs64() into abs(), remove abs64()

   - new_valid_dev() cleanups

   - binfmt_elf_fdpic feature work"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (24 commits)
  fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries
  fs/stat.c: remove unnecessary new_valid_dev() check
  fs/reiserfs/namei.c: remove unnecessary new_valid_dev() check
  fs/nilfs2/namei.c: remove unnecessary new_valid_dev() check
  fs/ncpfs/dir.c: remove unnecessary new_valid_dev() check
  fs/jfs: remove unnecessary new_valid_dev() checks
  fs/hpfs/namei.c: remove unnecessary new_valid_dev() check
  fs/f2fs/namei.c: remove unnecessary new_valid_dev() check
  fs/ext2/namei.c: remove unnecessary new_valid_dev() check
  fs/exofs/namei.c: remove unnecessary new_valid_dev() check
  fs/btrfs/inode.c: remove unnecessary new_valid_dev() check
  fs/9p: remove unnecessary new_valid_dev() checks
  include/linux/kdev_t.h: old/new_valid_dev() can return bool
  include/linux/kdev_t.h: remove unused huge_valid_dev()
  kmap_atomic_to_page() has no users, remove it
  drivers/scsi/cxgbi: fix build with EXTRA_CFLAGS
  dma: remove external references to dma_supported
  Documentation/sysctl/vm.txt: fix misleading code reference of overcommit_memory
  remove abs64()
  kernel.h: make abs() work with 64-bit types
  ...
2015-11-09 21:05:13 -08:00
Eric Dumazet d69bbf88c8 net: fix a race in dst_release()
Only cpu seeing dst refcount going to 0 can safely
dereference dst->flags.

Otherwise an other cpu might already have freed the dst.

Fixes: 27b75c95f1 ("net: avoid RCU for NOCACHE dst")
Reported-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-09 21:55:48 -05:00
Linus Torvalds e6604ecb70 NFS client updates for Linux 4.4
Highlights include:
 
 Features:
 - RDMA client backchannel from Chuck
 - Support for NFSv4.2 file CLONE using the btrfs ioctl
 
 Bugfixes + cleanups
 - Move socket data receive out of the bottom halves and into a workqueue
 - Refactor NFSv4 error handling so synchronous and asynchronous RPC handles
   errors identically.
 - Fix a panic when blocks or object layouts reads return a bad data length
 - Fix nfsroot so it can handle a 1024 byte long path.
 - Fix bad usage of page offset in bl_read_pagelist
 - Various NFSv4 callback cleanups+fixes
 - Fix GETATTR bitmap verification
 - Support hexadecimal number for sunrpc debug sysctl files
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWQPMXAAoJEGcL54qWCgDy6ZUQAL32vpgyMXe7R4jcxoQxm52+
 tn8FrY8aBZAqucvQsIGCrYfE01W/s8goDTQdZODn0MCcoor12BTPVYNIR42/J/no
 MNnRTDF0dJ4WG+inX9G87XGG6sFN3wDaQcCaexknkQZlFNF4KthxojzR2BgjmRVI
 p3WKkLSNTt6DYQQ8eDetvKoDT0AjR/KCYm89tiE8GMhKYcaZl6dTazJxwOcp2CX9
 YDW6+fQbsv8qp5v2ay03e88O/DSmcNRFoxy/KUGT9OwJgdN08IN8fTt6GG38yycT
 D9tb9uObBRcll4PnucouadBcykGr6jAP0z8HklE266LH1dwYLOHQoDFdgAs0QGtq
 nlySiKvToj6CYXonXoPOjZF3P/lxlkj5ViZ2enBxgxrPmyWl172cUSa6NTXOMO46
 kPpxw50xa1gP5kkBVwIZ6XZuzl/5YRhB3BRP3g6yuJCbAwVBJvawYU7riC+6DEB9
 zygVfm21vi9juUQXJ37zXVRBTtoFhFjuSxcAYxc63o181lWYShKQ3IiRYg+zTxnq
 7DOhXa0ZNGvMgJJi0tH9Es3/S6TrGhyKh5gKY/o2XUjY0hCSsCSdP6jw6Mb9Ax1s
 0LzByHAikxBKPt2OFeoUgwycI2xqow4iAfuFk071iP7n0nwC804cUHSkGxW67dBZ
 Ve5Skkg1CV+oWQYxGmGZ
 =py1V
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  New features:
   - RDMA client backchannel from Chuck
   - Support for NFSv4.2 file CLONE using the btrfs ioctl

  Bugfixes + cleanups:
   - Move socket data receive out of the bottom halves and into a
     workqueue
   - Refactor NFSv4 error handling so synchronous and asynchronous RPC
     handles errors identically.
   - Fix a panic when blocks or object layouts reads return a bad data
     length
   - Fix nfsroot so it can handle a 1024 byte long path.
   - Fix bad usage of page offset in bl_read_pagelist
   - Various NFSv4 callback cleanups+fixes
   - Fix GETATTR bitmap verification
   - Support hexadecimal number for sunrpc debug sysctl files"

* tag 'nfs-for-4.4-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (53 commits)
  Sunrpc: Supports hexadecimal number for sysctl files of sunrpc debug
  nfs: Fix GETATTR bitmap verification
  nfs: Remove unused xdr page offsets in getacl/setacl arguments
  fs/nfs: remove unnecessary new_valid_dev check
  SUNRPC: fix variable type
  NFS: Enable client side NFSv4.1 backchannel to use other transports
  pNFS/flexfiles: Add support for FF_FLAGS_NO_IO_THRU_MDS
  pNFS/flexfiles: When mirrored, retry failed reads by switching mirrors
  SUNRPC: Remove the TCP-only restriction in bc_svc_process()
  svcrdma: Add backward direction service for RPC/RDMA transport
  xprtrdma: Handle incoming backward direction RPC calls
  xprtrdma: Add support for sending backward direction RPC replies
  xprtrdma: Pre-allocate Work Requests for backchannel
  xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers
  SUNRPC: Abstract backchannel operations
  xprtrdma: Saving IRQs no longer needed for rb_lock
  xprtrdma: Remove reply tasklet
  xprtrdma: Use workqueue to process RPC/RDMA replies
  xprtrdma: Replace send and receive arrays
  xprtrdma: Refactor reply handler error handling
  ...
2015-11-09 18:11:22 -08:00
Andrew Morton 79211c8ed1 remove abs64()
Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.

Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-09 15:11:24 -08:00
Anthony Lineham c255cb2ed3 netfilter: Fix removal of GRE expectation entries created by PPTP
The uninitialized tuple structure caused incorrect hash calculation
and the lookup failed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=106441
Signed-off-by: Anthony Lineham <anthony.lineham@alliedtelesis.co.nz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-11-09 13:32:14 +01:00
Eric Dumazet 3aed822591 netfilter: nft_meta: use skb_to_full_sk() helper
SYNACK packets might be attached to request sockets.

Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-08 20:56:39 -05:00
Eric Dumazet 02a56c81cf net_sched: em_meta: use skb_to_full_sk() helper
SYNACK packets might be attached to request sockets.

Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-08 20:56:39 -05:00
Eric Dumazet 743b2a6674 sched: cls_flow: use skb_to_full_sk() helper
SYNACK packets might be attached to request sockets.

Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-08 20:56:39 -05:00
Eric Dumazet fdd723e2a8 netfilter: xt_owner: use skb_to_full_sk() helper
SYNACK packets might be attached to a request socket,
xt_owner wants to gte the listener in this case.

Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-08 20:56:39 -05:00
Linus Torvalds ad804a0b2a Merge branch 'akpm' (patches from Andrew)
Merge second patch-bomb from Andrew Morton:

 - most of the rest of MM

 - procfs

 - lib/ updates

 - printk updates

 - bitops infrastructure tweaks

 - checkpatch updates

 - nilfs2 update

 - signals

 - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
   dma-debug, dma-mapping, ...

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits)
  ipc,msg: drop dst nil validation in copy_msg
  include/linux/zutil.h: fix usage example of zlib_adler32()
  panic: release stale console lock to always get the logbuf printed out
  dma-debug: check nents in dma_sync_sg*
  dma-mapping: tidy up dma_parms default handling
  pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
  kexec: use file name as the output message prefix
  fs, seqfile: always allow oom killer
  seq_file: reuse string_escape_str()
  fs/seq_file: use seq_* helpers in seq_hex_dump()
  coredump: change zap_threads() and zap_process() to use for_each_thread()
  coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
  signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
  signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
  signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
  signals: kill block_all_signals() and unblock_all_signals()
  nilfs2: fix gcc uninitialized-variable warnings in powerpc build
  nilfs2: fix gcc unused-but-set-variable warnings
  MAINTAINERS: nilfs2: add header file for tracing
  nilfs2: add tracepoints for analyzing reading and writing metadata files
  ...
2015-11-07 14:32:45 -08:00
Linus Torvalds ab9f2faf8f Initial 4.4 merge window submission
- "Checksum offload support in user space" enablement
 - Misc cxgb4 fixes, add T6 support
 - Misc usnic fixes
 - 32 bit build warning fixes
 - Misc ocrdma fixes
 - Multicast loopback prevention extension
 - Extend the GID cache to store and return attributes of GIDs
 - Misc iSER updates
 - iSER clustering update
 - Network NameSpace support for rdma CM
 - Work Request cleanup series
 - New Memory Registration API
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWPO5UAAoJELgmozMOVy/dSCQP/iX2ImMZOS3VkOYKhLR3dSv8
 4vTEiYIoAT1JEXiPpiabuuACwotcZcMRk9kZ0dcWmBoFusTzKJmoDOkgAYd95XqY
 EsAyjqtzUGNNMjH5u5W+kdbaFdH9Ktq7IJvspRlJuvzC47Srax+qBxX01jrAkDgh
 4PoA3hEa2KkvkDjY2Mhvk9EWd/uflO9Ky6o0D8jUQkWtEvKBRyDjQLk30oW6wHX9
 pTWqww3dD0EXTrR+PDA88v2saKH1kZFU1Nt2eU8Bw+zlJM8hcX6U7PfRX0g3HT/J
 o+7ejTdLPWFDH35gJOU+KE519f1JbwfRjPJCqbOC9IttBB7iHSbhcpQLpWv4JV1x
 agdBeDA3TGQj3dHb2SkYMlWXCBp7q8UCbVGvvirTFzGSGU73sc6hhP+vCKvPQIlE
 Ah5tUqD7Y3mOBjvuDeIzKMLXILd5d3cH+m7Laytrf5e7fJPmBRZyOkcMh0QVElyl
 mKo+PFjghgeTFb405J7SDDw/vThVyN9HyIt7AGEzObaajzOOk9R1hkQr46XVy9TK
 yi58fl85yQ2n6TWV6NRnvkQoMy/N2HAEuXk/7HtO0PabV5w3Lo0zvXB9SnVrrVEm
 58FWRBYCWorVSdSacuDnPm0iz45WSRIb9G9sBlhEC93eXRq2rSBoy4RvyLeliHFH
 hllyhNNolI6FJ64j07Xm
 =bBIY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is my initial round of 4.4 merge window patches.  There are a few
  other things I wish to get in for 4.4 that aren't in this pull, as
  this represents what has gone through merge/build/run testing and not
  what is the last few items for which testing is not yet complete.

   - "Checksum offload support in user space" enablement
   - Misc cxgb4 fixes, add T6 support
   - Misc usnic fixes
   - 32 bit build warning fixes
   - Misc ocrdma fixes
   - Multicast loopback prevention extension
   - Extend the GID cache to store and return attributes of GIDs
   - Misc iSER updates
   - iSER clustering update
   - Network NameSpace support for rdma CM
   - Work Request cleanup series
   - New Memory Registration API"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits)
  IB/core, cma: Make __attribute_const__ declarations sparse-friendly
  IB/core: Remove old fast registration API
  IB/ipath: Remove fast registration from the code
  IB/hfi1: Remove fast registration from the code
  RDMA/nes: Remove old FRWR API
  IB/qib: Remove old FRWR API
  iw_cxgb4: Remove old FRWR API
  RDMA/cxgb3: Remove old FRWR API
  RDMA/ocrdma: Remove old FRWR API
  IB/mlx4: Remove old FRWR API support
  IB/mlx5: Remove old FRWR API support
  IB/srp: Dont allocate a page vector when using fast_reg
  IB/srp: Remove srp_finish_mapping
  IB/srp: Convert to new registration API
  IB/srp: Split srp_map_sg
  RDS/IW: Convert to new memory registration API
  svcrdma: Port to new memory registration API
  xprtrdma: Port to new memory registration API
  iser-target: Port to new memory registration API
  IB/iser: Port to new fast registration API
  ...
2015-11-07 13:33:07 -08:00
Linus Torvalds 75021d2859 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial updates from Jiri Kosina:
 "Trivial stuff from trivial tree that can be trivially summed up as:

   - treewide drop of spurious unlikely() before IS_ERR() from Viresh
     Kumar

   - cosmetic fixes (that don't really affect basic functionality of the
     driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek

   - various comment / printk fixes and updates all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  bcache: Really show state of work pending bit
  hwmon: applesmc: fix comment typos
  Kconfig: remove comment about scsi_wait_scan module
  class_find_device: fix reference to argument "match"
  debugfs: document that debugfs_remove*() accepts NULL and error values
  net: Drop unlikely before IS_ERR(_OR_NULL)
  mm: Drop unlikely before IS_ERR(_OR_NULL)
  fs: Drop unlikely before IS_ERR(_OR_NULL)
  drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
  drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
  UBI: Update comments to reflect UBI_METAONLY flag
  pktcdvd: drop null test before destroy functions
2015-11-07 13:05:44 -08:00
Jozsef Kadlecsik 0aae24eb40 netfilter: ipset: Fix hash type expire: release empty hash bucket block
When all entries are expired/all slots are empty, release the bucket.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-11-07 11:28:49 +01:00
Jozsef Kadlecsik e9dfdc052d netfilter: ipset: Fix hash:* type expiration
Incorrect index was used when the data blob was shrinked at expiration,
which could lead to falsely expired entries and memory leak when
the comment extension was used too.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-11-07 11:23:34 +01:00
Jozsef Kadlecsik 95ad1f4a93 netfilter: ipset: Fix extension alignment
The data extensions in ipset lacked the proper memory alignment and
thus could lead to kernel crash on several architectures. Therefore
the structures have been reorganized and alignment attributes added
where needed. The patch was tested on armv7h by Gerhard Wiesinger and
on x86_64, sparc64 by Jozsef Kadlecsik.

Reported-by: Gerhard Wiesinger <lists@wiesinger.com>
Tested-by: Gerhard Wiesinger <lists@wiesinger.com>
Tested-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
2015-11-07 11:21:47 +01:00
Mel Gorman d0164adc89 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Linus Torvalds 1873499e13 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem update from James Morris:
 "This is mostly maintenance updates across the subsystem, with a
  notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
  maintainer of that"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
  apparmor: clarify CRYPTO dependency
  selinux: Use a kmem_cache for allocation struct file_security_struct
  selinux: ioctl_has_perm should be static
  selinux: use sprintf return value
  selinux: use kstrdup() in security_get_bools()
  selinux: use kmemdup in security_sid_to_context_core()
  selinux: remove pointless cast in selinux_inode_setsecurity()
  selinux: introduce security_context_str_to_sid
  selinux: do not check open perm on ftruncate call
  selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
  KEYS: Merge the type-specific data with the payload data
  KEYS: Provide a script to extract a module signature
  KEYS: Provide a script to extract the sys cert list from a vmlinux file
  keys: Be more consistent in selection of union members used
  certs: add .gitignore to stop git nagging about x509_certificate_list
  KEYS: use kvfree() in add_key
  Smack: limited capability for changing process label
  TPM: remove unnecessary little endian conversion
  vTPM: support little endian guests
  char: Drop owner assignment from i2c_driver
  ...
2015-11-05 15:32:38 -08:00
Eric Dumazet 49a496c97d tcp: use correct req pointer in tcp_move_syn() calls
I mistakenly took wrong request sock pointer when calling tcp_move_syn()

@req_unhash is either a copy of @req, or a NULL value for
FastOpen connexions (as we do not expect to unhash the temporary
request sock from ehash table)

Fixes: 805c4bc057 ("tcp: fix req->saved_syn race")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ying Cai <ycai@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-05 15:57:51 -05:00
Francesco Ruggeri 30f7ea1c2b packet: race condition in packet_bind
There is a race conditions between packet_notifier and packet_bind{_spkt}.

It happens if packet_notifier(NETDEV_UNREGISTER) executes between the
time packet_bind{_spkt} takes a reference on the new netdevice and the
time packet_do_bind sets po->ifindex.
In this case the notification can be missed.
If this happens during a dev_change_net_namespace this can result in the
netdevice to be moved to the new namespace while the packet_sock in the
old namespace still holds a reference on it. When the netdevice is later
deleted in the new namespace the deletion hangs since the packet_sock
is not found in the new namespace' &net->packet.sklist.
It can be reproduced with the script below.

This patch makes packet_do_bind check again for the presence of the
netdevice in the packet_sock's namespace after the synchronize_net
in unregister_prot_hook.
More in general it also uses the rcu lock for the duration of the bind
to stop dev_change_net_namespace/rollback_registered_many from
going past the synchronize_net following unlist_netdevice, so that
no NETDEV_UNREGISTER notifications can happen on the new netdevice
while the bind is executing. In order to do this some code from
packet_bind{_spkt} is consolidated into packet_do_dev.

import socket, os, time, sys
proto=7
realDev='em1'
vlanId=400
if len(sys.argv) > 1:
   vlanId=int(sys.argv[1])
dev='vlan%d' % vlanId

os.system('taskset -p 0x10 %d' % os.getpid())

s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, proto)
os.system('ip link add link %s name %s type vlan id %d' %
          (realDev, dev, vlanId))
os.system('ip netns add dummy')

pid=os.fork()

if pid == 0:
   # dev should be moved while packet_do_bind is in synchronize net
   os.system('taskset -p 0x20000 %d' % os.getpid())
   os.system('ip link set %s netns dummy' % dev)
   os.system('ip netns exec dummy ip link del %s' % dev)
   s.close()
   sys.exit(0)

time.sleep(.004)
try:
   s.bind(('%s' % dev, proto+1))
except:
   print 'Could not bind socket'
   s.close()
   os.system('ip netns del dummy')
   sys.exit(0)

os.waitpid(pid, 0)
s.close()
os.system('ip netns del dummy')
sys.exit(0)

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-05 14:48:42 -05:00
Eric Dumazet f668f5f7e0 ipv4: use sk_fullsock() in ipv4_conntrack_defrag()
Before converting a 'socket pointer' into inet socket,
use sk_fullsock() to detect timewait or request sockets.

Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-05 14:36:09 -05:00
Eric Dumazet 805c4bc057 tcp: fix req->saved_syn race
For the reasons explained in commit ce1050089c ("tcp/dccp: fix
ireq->pktopts race"), we need to make sure we do not access
req->saved_syn unless we own the request sock.

This fixes races for listeners using TCP_SAVE_SYN option.

Fixes: e994b2f0fb ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Ying Cai <ycai@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-05 14:36:09 -05:00
David S. Miller 096273304c Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:

====================
pull request: bluetooth 2015-11-05

The following set of Bluetooth patches would be good to get into 4.4-rc1
if possible:

 - Fix for missing LE CoC parameter validity checks
 - Fix for potential deadlock in btusb
 - Fix for issuing unsupported commands during HCI init

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-05 11:38:06 -05:00
Linus Torvalds fd0d351de7 TTY/Serial driver patches for 4.4-rc1
Here is the big tty and serial driver update for 4.4-rc1.
 
 Lots of serial driver updates and a few small tty core changes.  Full
 details in the shortlog.
 
 All of these have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlY6f64ACgkQMUfUDdst+ykf8gCfYPjtHy5hD/TsharaeXROnVgi
 W8cAn16xk1Nmnde220MNNpO6zDu65G/1
 =kslf
 -----END PGP SIGNATURE-----

Merge tag 'tty-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull tty/serial driver updates from Greg KH:
 "Here is the big tty and serial driver update for 4.4-rc1.

  Lots of serial driver updates and a few small tty core changes.  Full
  details in the shortlog.

  All of these have been in linux-next for a while"

* tag 'tty-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (148 commits)
  tty: Use unbound workqueue for all input workers
  tty: Abstract tty buffer work
  tty: Prevent tty teardown during tty_write_message()
  tty: core: Use correct spinlock flavor in tiocspgrp()
  tty: Combine SIGTTOU/SIGTTIN handling
  serial: amba-pl011: fix incorrect integer size in pl011_fifo_to_tty()
  ttyFDC: Fix build problems due to use of module_{init,exit}
  tty: remove unneeded return statement
  serial: 8250_mid: add support for DMA engine handling from UART MMIO
  dmaengine: hsu: remove platform data
  dmaengine: hsu: introduce stubs for the exported functions
  dmaengine: hsu: make the UART driver in control of selecting this driver
  serial: fix mctrl helper functions
  serial: 8250_pci: Intel MID UART support to its own driver
  serial: fsl_lpuart: add earlycon support
  tty: disable unbind for old 74xx based serial/mpsc console port
  serial: pl011: Spelling s/clocks-names/clock-names/
  n_tty: Remove reader wakeups for TTY_BREAK/TTY_PARITY chars
  tty: synclink, fix indentation
  serial: at91, fix rs485 properties
  ...
2015-11-04 21:35:12 -08:00
Sabrina Dubroca 2a189f9e57 ipv6: clean up dev_snmp6 proc entry when we fail to initialize inet6_dev
In ipv6_add_dev, when addrconf_sysctl_register fails, we do not clean up
the dev_snmp6 entry that we have already registered for this device.
Call snmp6_unregister_dev in this case.

Fixes: a317a2f19d ("ipv6: fail early when creating netdev named all or default")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-04 23:49:48 -05:00
Johan Hedberg 40624183c2 Bluetooth: L2CAP: Add missing checks for invalid LE DCID
When receiving a connect response we should make sure that the DCID is
within the valid range and that we don't already have another channel
allocated for the same DCID.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-05 04:04:15 +01:00
Johan Hedberg ab0c127fbb Bluetooth: L2CAP: Fix checked range when allocating new CID
The 'dyn_end' value is also a valid CID so it should be included in
the range of values checked.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-05 04:04:07 +01:00
Johan Hedberg 8a7889cc6e Bluetooth: L2CAP: Fix returning correct LE CoC response codes
The core spec defines specific response codes for situations when the
received CID is incorrect. Add the defines for these and return them
as appropriate from the LE Connect Request handler function.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-11-05 04:04:00 +01:00
Marcel Holtmann 2ab216a7a9 Bluetooth: Check for supported white list before issuing commands
The white list commands might not be implemented if the controller does
not actually support the white list. So check the supported commands
first before issuing these commands. Not supporting the white list is
the same as supporting a white list with zero size.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-11-05 04:03:21 +01:00
Stefan Hajnoczi 7362945aea VSOCK: call sk->sk_data_ready() on accept()
When a listen socket enqueues a connection for userspace to accept(),
the sk->sk_data_ready() callback should be invoked.  In-kernel socket
users rely on this callback to detect when incoming connections are
available.

Currently the sk->sk_state_change() callback is invoked by
vmci_transport.c.  This happens to work for userspace applications since
sk->sk_state_change = sock_def_wakeup() and sk->sk_data_ready =
sock_def_readable() both wake up the accept() waiter.  In-kernel socket
users, on the other hand, fail to detect incoming connections.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-04 22:03:10 -05:00
Jarod Wilson e7868a85e1 net/core: ensure features get disabled on new lower devs
With moving netdev_sync_lower_features() after the .ndo_set_features
calls, I neglected to verify that devices added *after* a flag had been
disabled on an upper device were properly added with that flag disabled as
well. This currently happens, because we exit __netdev_update_features()
when we see dev->features == features for the upper dev. We can retain the
optimization of leaving without calling .ndo_set_features with a bit of
tweaking and a goto here.

Fixes: fd867d51f8 ("net/core: generic support for disabling netdev features down stack")
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Nikolay Aleksandrov <razor@blackwall.org>
CC: Michal Kubecek <mkubecek@suse.cz>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: netdev@vger.kernel.org
Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-04 21:56:00 -05:00
David Ahern e1b8d903c6 net: Fix prefsrc lookups
A bug report (https://bugzilla.kernel.org/show_bug.cgi?id=107071) noted
that the follwoing ip command is failing with v4.3:

    $ ip route add 10.248.5.0/24 dev bond0.250 table vlan_250 src 10.248.5.154
    RTNETLINK answers: Invalid argument

021dd3b8a1 changed the lookup of the given preferred source address to
use the table id passed in, but this assumes the local entries are in the
given table which is not necessarily true for non-VRF use cases. When
validating the preferred source fallback to the local table on failure.

Fixes: 021dd3b8a1 ("net: Add routes to the table associated with the device")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-04 21:34:37 -05:00
WANG Cong 87e9f03159 ipv4: fix a potential deadlock in mcast getsockopt() path
Sasha reported the following lockdep warning:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(sk_lock-AF_INET);
                                lock(rtnl_mutex);
                                lock(sk_lock-AF_INET);
   lock(rtnl_mutex);

This is due to that for IP_MSFILTER and MCAST_MSFILTER, we take
rtnl lock before the socket lock in setsockopt() path, but take
the socket lock before rtnl lock in getsockopt() path. All the
rest optnames are setsockopt()-only.

Fix this by aligning the getsockopt() path with the setsockopt()
path, so that all mcast socket path would be locked in the same
order.

Note, IPv6 part is different where rtnl lock is not held.

Fixes: 54ff9ef36b ("ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop}")
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-04 21:29:59 -05:00
WANG Cong 4ee3bd4a8c ipv4: disable BH when changing ip local port range
This fixes the following lockdep warning:

 [ INFO: inconsistent lock state ]
 4.3.0-rc7+ #1197 Not tainted
 ---------------------------------
 inconsistent {IN-SOFTIRQ-R} -> {SOFTIRQ-ON-W} usage.
 sysctl/1019 [HC0[0]:SC0[0]:HE1:SE1] takes:
  (&(&net->ipv4.ip_local_ports.lock)->seqcount){+.+-..}, at: [<ffffffff81921de7>] ipv4_local_port_range+0xb4/0x12a
 {IN-SOFTIRQ-R} state was registered at:
   [<ffffffff810bd682>] __lock_acquire+0x2f6/0xdf0
   [<ffffffff810be6d5>] lock_acquire+0x11c/0x1a4
   [<ffffffff818e599c>] inet_get_local_port_range+0x4e/0xae
   [<ffffffff8166e8e3>] udp_flow_src_port.constprop.40+0x23/0x116
   [<ffffffff81671cb9>] vxlan_xmit_one+0x219/0xa6a
   [<ffffffff81672f75>] vxlan_xmit+0xa6b/0xaa5
   [<ffffffff817f2deb>] dev_hard_start_xmit+0x2ae/0x465
   [<ffffffff817f35ed>] __dev_queue_xmit+0x531/0x633
   [<ffffffff817f3702>] dev_queue_xmit_sk+0x13/0x15
   [<ffffffff818004a5>] neigh_resolve_output+0x12f/0x14d
   [<ffffffff81959cfa>] ip6_finish_output2+0x344/0x39f
   [<ffffffff8195bf58>] ip6_finish_output+0x88/0x8e
   [<ffffffff8195bfef>] ip6_output+0x91/0xe5
   [<ffffffff819792ae>] dst_output_sk+0x47/0x4c
   [<ffffffff81979392>] NF_HOOK_THRESH.constprop.30+0x38/0x82
   [<ffffffff8197981e>] mld_sendpack+0x189/0x266
   [<ffffffff8197b28b>] mld_ifc_timer_expire+0x1ef/0x223
   [<ffffffff810de581>] call_timer_fn+0xfb/0x28c
   [<ffffffff810ded1e>] run_timer_softirq+0x1c7/0x1f1

Fixes: b8f1a55639 ("udp: Add function to make source port for UDP tunnels")
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-04 21:29:06 -05:00
David S. Miller d9c7dbc11a Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Conflicts:
	net/netfilter/xt_TEE.c

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are:

1) Fix crash when TEE target is used with no --oif, from Eric Dumazet.

2) Oneliner to fix a crash on the redirect traffic to localhost
   infrastructure when interface has not yet an address, from
   Munehisa Kamata.

3) Oneliner not to request module all the time from nfnetlink due to
   wrong type value, from Florian Westphal.

I'll make sure these patches 1 and 2 hit -stable.
====================

The conflict in net/netfilter/xt_TEE.c was minor, a change
to the 'oif' selection overlapping a function signature
change for the nf_dup_ipv{4,6}() routines.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-04 20:47:50 -05:00
Kinglong Mee 941c3ff310 Sunrpc: Supports hexadecimal number for sysctl files of sunrpc debug
The sunrpc debug sysctl files only accept decimal number right now.
But all the XXXDBUG_XXX macros are defined as hexadecimal.
It is not easy to set or check an separate flag.

This patch let those files support accepting hexadecimal number,
(decimal number is also supported). Also, display it as hexadecimal.

v2,
Remove duplicate parsing of '0x...', just using simple_strtol(tmpbuf, &s, 0)
Fix a bug of isspace() checking after parsing

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-03 15:56:49 -05:00
David S. Miller 73186df8d7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor overlapping changes in net/ipv4/ipmr.c, in 'net' we were
fixing the "BH-ness" of the counter bumps whilst in 'net-next'
the functions were modified to take an explicit 'net' parameter.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03 13:41:45 -05:00
Jiri Pirko 0c63d80c3f switchdev: respect SKIP_EOPNOTSUPP flag in case there is no recursion
Caller passing down the SKIP_EOPNOTSUPP switchdev flag expects that
-EOPNOTSUPP cannot be returned. But in case of direct op call without
recurtion, this may happen. So fix this by checking it always on the
end of __switchdev_port_attr_set function.

Fixes: 464314ea6c ("switchdev: skip over ports returning -EOPNOTSUPP when recursing ports")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03 13:39:21 -05:00
Phil Sutter 2a4f417621 net: sched: kill dead code in sch_choke.c
It looks like this has never been used at all.

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03 13:30:47 -05:00
Markus Elfring 37b8e1ca0f irda: Delete an unnecessary check before the function call "irlmp_unregister_service"
The irlmp_unregister_service() function tests whether its argument is NULL
and then returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03 13:30:47 -05:00
Andrzej Hajda 7fc561362d SUNRPC: fix variable type
Due to incorrect len type bc_send_request returned always zero.

The problem has been detected using proposed semantic patch
scripts/coccinelle/tests/assign_signed_to_unsigned.cocci [1].

[1]: http://permalink.gmane.org/gmane.linux.kernel/2046107

Signed-off-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-11-03 12:31:31 -05:00
David S. Miller b3047a77cb Another set of fixes:
* remove a warning on a check that can trigger without any
    errors having happened (Andrei)
  * correctly handle deauth request while in the process of
    associating (Andrei)
  * fix TDLS HT operation (Arik)
  * allow changing AID/listen interval during client setup (Ayala)
  * be more forgiving with WMM parameters to get HT/VHT in case of
    broken APs with bad WMM settings (Emmanuel, myself)
  * a number of other fixes (some in documentation)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJWOKsAAAoJEDBSmw7B7bqrNKMQAKFH81CscgJGOQb/zgGdmuF3
 kNrWrnH+3XqoqM2rHpIekQLxVkeUhM+hHCyaPCK7rVnCuu53pJ0u7P0rq922XAW4
 olFBGVdE1yG/69ndR9MYDjLWP+ikMmAiMLbM5qzPuDJ5XyBVACC1D82+qSRvByCK
 Z8PYJ+OsLk05eKa4ER7i8BVExRGM4vrce4Uh3K07yNKbfU81ztsltwflleRFGn3f
 OCydpuSId+C4TuSmkgBJF1718B9GazvAbZDw5t4jorIrbiZzQMZAtoi+YxwXVhev
 lvPCO8p1+lhWYUOK5LnO8mbdUFfe+kc3rrZjKWuXuDLp6mvPyP9FOaIFjFfnlJxT
 8QadG0QDzTlLHUj29gvrnww8aob9c7iHueXP9OlcBMp9uTyklgBJ+fMyvPfXpWXB
 Diy9n0VJfWzg8d74wWLLQy/N1qY6gwhXXwgW8TM/49O5BpbyvVsI6jFAR+8ZT9b9
 GLGEkN68RBuY03mejkf4PmhqgMVErA2JtabRI0Efm2Do85t9ZxgObF6INsrZ+o2M
 ffl7jhyHsFB+d38Ilwlb4cyWhxpIGrhTtt2h5zIsgNx3wmrXrarwMM3P4NGOOEbP
 Euqkk/LoMZdjjB/78JSi6hdQSYoQFaW85tHBzXhMXk0nYXHLWdVEJsLuAtATl8gM
 vzNkny8pcaLnRg/kXqgl
 =/d5+
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2015-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Another set of fixes:
 * remove a warning on a check that can trigger without any
   errors having happened (Andrei)
 * correctly handle deauth request while in the process of
   associating (Andrei)
 * fix TDLS HT operation (Arik)
 * allow changing AID/listen interval during client setup (Ayala)
 * be more forgiving with WMM parameters to get HT/VHT in case of
   broken APs with bad WMM settings (Emmanuel, myself)
 * a number of other fixes (some in documentation)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03 11:30:57 -05:00
Jarod Wilson 5ba3f7d61a net/core: fix for_each_netdev_feature
As pointed out by Nikolay and further explained by Geert, the initial
for_each_netdev_feature macro was broken, as feature would get set outside
of the block of code it was intended to run in, thus only ever working for
the first feature bit in the mask. While less pretty this way, this is
tested and confirmed functional with multiple feature bits set in
NETIF_F_UPPER_DISABLES.

[root@dell-per730-01 ~]# ethtool -K bond0 lro off
...
[  242.761394] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2.
[  243.552178] bnx2x 0000:06:00.1 p5p2: using MSI-X  IRQs: sp 74  fp[0] 76 ... fp[7] 83
[  244.353978] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1.
[  245.147420] bnx2x 0000:06:00.0 p5p1: using MSI-X  IRQs: sp 62  fp[0] 64 ... fp[7] 71

[root@dell-per730-01 ~]# ethtool -K bond0 gro off
...
[  251.925645] bond0: Disabling feature 0x0000000000004000 on lower dev p5p2.
[  252.713693] bnx2x 0000:06:00.1 p5p2: using MSI-X  IRQs: sp 74  fp[0] 76 ... fp[7] 83
[  253.499085] bond0: Disabling feature 0x0000000000004000 on lower dev p5p1.
[  254.290922] bnx2x 0000:06:00.0 p5p1: using MSI-X  IRQs: sp 62  fp[0] 64 ... fp[7] 71

Fixes: fd867d51f ("net/core: generic support for disabling netdev features down stack")
CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Nikolay Aleksandrov <razor@blackwall.org>
CC: Michal Kubecek <mkubecek@suse.cz>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03 11:29:57 -05:00
Padmanabh Ratnakar 74f2d19caf vlan: Invoke driver vlan hooks only if device is present
NIC drivers mark device as detached during error recovery.
It expects no manangement hooks to be invoked in this state.
Invoke driver vlan hooks only if device is present.

Signed-off-by: Padmanabh Ratnakar <padmanabh.ratnakar@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03 11:29:56 -05:00
Stefan Sørensen 5f94c943d5 ptp: Change ptp_class to a proper bitmask
Change the definition of PTP_CLASS_L2 to not have any bits overlapping with
the other defined protocol values, allowing the PTP_CLASS_* definitions to
be for simple filtering on packet type.

Signed-off-by: Stefan Sørensen <stefan.sorensen@spectralink.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03 11:08:22 -05:00
Michal Kubeček ebac62fe3d ipv6: fix tunnel error handling
Both tunnel6_protocol and tunnel46_protocol share the same error
handler, tunnel6_err(), which traverses through tunnel6_handlers list.
For ipip6 tunnels, we need to traverse tunnel46_handlers as we do e.g.
in tunnel46_rcv(). Current code can generate an ICMPv6 error message
with an IPv4 packet embedded in it.

Fixes: 73d605d1ab ("[IPSEC]: changing API of xfrm6_tunnel_register")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-03 10:52:13 -05:00
Ayala Beker e420842724 cfg80211: allow AID/listen interval changes for unassociated station
Currently, cfg80211 rejects updates of AID and listen interval parameters
for existing entries. This information is known only at association stage
and as a result it's impossible to update entries that were added
unassociated.
Fix this by allowing updates of these properies for stations that the
driver (or mac80211) assigned unassociated state.

This then fixes mac80211's use of NL80211_FEATURE_FULL_AP_CLIENT_STATE.

Signed-off-by: Ayala Beker <ayala.beker@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 11:20:29 +01:00
Chaitanya T K dcae9e0203 mac80211: document sleep requirements for channel context ops
Channel context driver operations can sleep, so add might_sleep()
and document this.

Signed-off-by: Chaitanya T K <chaitanya.mgit@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 11:15:48 +01:00
Johannes Berg ef95d8ba38 mac80211: further improve "no supported rates" warning
Allow distinguishing the non-station case from the case of a
station without rates, by using -1 for the non-station case.
This value cannot be reached with a station since that many
legacy rates don't exist.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:56:42 +01:00
Johannes Berg 2ed77ea692 mac80211: treat bad WMM parameters more gracefully
As WMM is required for HT/VHT operation, treat bad WMM parameters
more gracefully by falling back to default parameters instead of
not using WMM assocation. This makes it possible to still use HT
or VHT, although potentially with reduced quality of service due
to unintended WMM parameters.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:56:26 +01:00
Emmanuel Grumbach 730a755017 mac80211: fixup AIFSN instead of disabling WMM
Disabling WMM has a huge impact these days. It implies that
HT and VHT will be disabled which means that the throughput
will be drammatically reduced.
Since the AIFSN is a transmission parameter, we can play a
bit and fix it up to make it compliant with the 802.11
specification which requires it to be at least 2.
Increasing it from 1 to 2 will slightly reduce the
likelyhood to get a transmission opportunity compared to
other clients that would accept to set AIFSN=1, but at
least it will allow HT and VHT which is a huge gain.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:56:13 +01:00
Johannes Berg cec6628350 mac80211: make enable_qos parameter to ieee80211_set_wmm_default()
The function currently determines this value, for use in bss_info.qos,
based on the interface type itself. Make it a parameter instead and
set it with the same logic for now.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:55:55 +01:00
Matthias Schiffer 520c75dcae mac80211: fix crash on mesh local link ID generation with VIFs
llid_in_use needs to be limited to stations of the same VIF, otherwise it
will cause a NULL deref as the sta_info of non-mesh-VIFs don't have
sta->mesh set.

Steps to reproduce:

   modprobe mac80211_hwsim channels=2
   iw phy phy0 interface add ibss0 type ibss
   iw phy phy0 interface add mesh0 type mp
   iw phy phy1 interface add ibss1 type ibss
   iw phy phy1 interface add mesh1 type mp
   ip link set ibss0 up
   ip link set mesh0 up
   ip link set ibss1 up
   ip link set mesh1 up
   iw dev ibss0 ibss join foo 2412
   iw dev ibss1 ibss join foo 2412
   # Ensure that ibss0 and ibss1 are actually associated; I often need to
   # leave and join the cell on ibss1 a second time.
   iw dev mesh0 mesh join bar
   iw dev mesh1 mesh join bar # crash

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:43:40 +01:00
Arik Nemtsov 57f255f581 mac80211: TDLS: add proper HT-oper IE
When 11n peers performs a TDLS connection on a legacy BSS, the HT
operation IE must be specified according to IEEE802.11-2012 section
9.23.3.2. Otherwise HT-protection is compromised and the medium becomes
noisy for both the TDLS and the BSS links.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:42:47 +01:00
Eliad Peller 0d440ea294 mac80211: don't reconfigure sched scan in case of wowlan
Scheduled scan has to be reconfigured only if wowlan wasn't
configured, since otherwise it should continue to run (with
the 'any' trigger) or be aborted.

The current code will end up asking the driver to start a new
scheduled scan without stopping the previous one, and leaking
some memory (from the previous request.)

Fix this by doing the abort/restart under the proper conditions.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:42:05 +01:00
Eliad Peller 968a76cef3 mac80211: call drv_stop only if driver is started
If drv_start() fails during hw_restart, all the running
interfaces are being closed/stopped, which results in
drv_stop() being called, although the driver was never
started successfully.

This might cause drivers to perform operations on uninitialized
memory (as they assume it was initialized on drv_start)

Consider the local->started flag, and call the driver's stop()
op only if drv_start() succeeded before.

Move drv_start() and drv_stop() to driver-ops.c, as they are no
longer simple wrappers.

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:41:12 +01:00
Andrei Otcheretianski c189a685b8 mac80211: Remove WARN_ON_ONCE in ieee80211_recalc_smps
The recalc_smps work can run after the station disassociates.
At this stage we already released the channel, but the work
will be cancelled only when the interface stops.
In this scenario we can hit the warning in ieee80211_recalc_smps, so
just remove it.

Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:41:02 +01:00
Eliad Peller 43d6df007c mac80211: use freezable workqueue for restart work
Requesting hw restart during suspend might result
in the restart work being executed after mac80211
and the hw are suspended.

Solve the race by simply scheduling the restart
work on a freezable workqueue.

Note that there can be some cases of reconfiguration
on resume (besides the hardware restart):

* wowlan is not configured -
    All the interfaces removed were removed on suspend,
    and drv_stop() was called. At this point the driver
    shouldn't expect for hw_restart anyway, so we can
    simply cancel it (on resume).

* wowlan is configured, drv_resume() == 1
    There is no definitive expected behavior in this case,
    as each driver might have different expectations (e.g.
    setting some flags on suspend/restart vs. not handling
    spurious recovery).
    For now, simply let the hw_restart work run again after
    resume, and hope the driver will handle it well (or at
    least initiate another hw restart).

Signed-off-by: Eliad Peller <eliadx.peller@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:41:01 +01:00
Andrei Otcheretianski a64cba3c53 mac80211: Fix local deauth while associating
Local request to deauthenticate wasn't handled while associating, thus
the association could continue even when the user space required to
disconnect.

Cc: stable@vger.kernel.org
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:41:00 +01:00
Arik Nemtsov 254d3dfe44 mac80211: allow null chandef in tracing
In TDLS channel-switch operations the chandef can sometimes be NULL.
Avoid an oops in the trace code for these cases and just print a
chandef full of zeros.

Cc: stable@vger.kernel.org
Fixes: a7a6bdd067 ("mac80211: introduce TDLS channel switch ops")
Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:40:59 +01:00
Ola Olsson 4baf6bea37 nl80211: Fix potential memory leak from parse_acl_data
If parse_acl_data succeeds but the subsequent parsing of smps
attributes fails, there will be a memory leak due to early returns.
Fix that by moving the ACL parsing later.

Cc: stable@vger.kernel.org
Fixes: 18998c381b ("cfg80211: allow requesting SMPS mode on ap start")
Signed-off-by: Ola Olsson <ola.olsson@sonymobile.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:40:58 +01:00
Janusz.Dziedzic@tieto.com 519ee6918b mac80211: fix divide by zero when NOA update
In case of one shot NOA the interval can be 0, catch that
instead of potentially (depending on the driver) crashing
like this:

divide error: 0000 [#1] SMP
[...]
Call Trace:
<IRQ>
[<ffffffffc08e891c>] ieee80211_extend_absent_time+0x6c/0xb0 [mac80211]
[<ffffffffc08e8a17>] ieee80211_update_p2p_noa+0xb7/0xe0 [mac80211]
[<ffffffffc069cc30>] ath9k_p2p_ps_timer+0x170/0x190 [ath9k]
[<ffffffffc070adf8>] ath_gen_timer_isr+0xc8/0xf0 [ath9k_hw]
[<ffffffffc0691156>] ath9k_tasklet+0x296/0x2f0 [ath9k]
[<ffffffff8107ad65>] tasklet_action+0xe5/0xf0
[...]

Cc: stable@vger.kernel.org [3.16+, due to d463af4a1c using it]
Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2015-11-03 10:40:57 +01:00
Dan Streetman a8a572a6b5 xfrm: dst_entries_init() per-net dst_ops
Remove the dst_entries_init/destroy calls for xfrm4 and xfrm6 dst_ops
templates; their dst_entries counters will never be used.  Move the
xfrm dst_ops initialization from the common xfrm/xfrm_policy.c to
xfrm4/xfrm4_policy.c and xfrm6/xfrm6_policy.c, and call dst_entries_init
and dst_entries_destroy for each net namespace.

The ipv4 and ipv6 xfrms each create dst_ops template, and perform
dst_entries_init on the templates.  The template values are copied to each
net namespace's xfrm.xfrm*_dst_ops.  The problem there is the dst_ops
pcpuc_entries field is a percpu counter and cannot be used correctly by
simply copying it to another object.

The result of this is a very subtle bug; changes to the dst entries
counter from one net namespace may sometimes get applied to a different
net namespace dst entries counter.  This is because of how the percpu
counter works; it has a main count field as well as a pointer to the
percpu variables.  Each net namespace maintains its own main count
variable, but all point to one set of percpu variables.  When any net
namespace happens to change one of the percpu variables to outside its
small batch range, its count is moved to the net namespace's main count
variable.  So with multiple net namespaces operating concurrently, the
dst_ops entries counter can stray from the actual value that it should
be; if counts are consistently moved from one net namespace to another
(which my testing showed is likely), then one net namespace winds up
with a negative dst_ops count while another winds up with a continually
increasing count, eventually reaching its gc_thresh limit, which causes
all new traffic on the net namespace to fail with -ENOBUFS.

Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2015-11-03 08:42:57 +01:00
Jarod Wilson fd867d51f8 net/core: generic support for disabling netdev features down stack
There are some netdev features, which when disabled on an upper device,
such as a bonding master or a bridge, must be disabled and cannot be
re-enabled on underlying devices.

This is a rework of an earlier more heavy-handed appraoch, which simply
disables and prevents re-enabling of netdev features listed in a new
define in include/net/netdev_features.h, NETIF_F_UPPER_DISABLES. Any upper
device that disables a flag in that feature mask, the disabling will
propagate down the stack, and any lower device that has any upper device
with one of those flags disabled should not be able to enable said flag.

Initially, only LRO is included for proof of concept, and because this
code effectively does the same thing as dev_disable_lro(), though it will
also activate from the ethtool path, which was one of the goals here.

[root@dell-per730-01 ~]# ethtool -k bond0 |grep large
large-receive-offload: on
[root@dell-per730-01 ~]# ethtool -k p5p1 |grep large
large-receive-offload: on
[root@dell-per730-01 ~]# ethtool -K bond0 lro off
[root@dell-per730-01 ~]# ethtool -k bond0 |grep large
large-receive-offload: off
[root@dell-per730-01 ~]# ethtool -k p5p1 |grep large
large-receive-offload: off

dmesg dump:

[ 1033.277986] bond0: Disabling feature 0x0000000000008000 on lower dev p5p2.
[ 1034.067949] bnx2x 0000:06:00.1 p5p2: using MSI-X  IRQs: sp 74  fp[0] 76 ... fp[7] 83
[ 1034.753612] bond0: Disabling feature 0x0000000000008000 on lower dev p5p1.
[ 1035.591019] bnx2x 0000:06:00.0 p5p1: using MSI-X  IRQs: sp 62  fp[0] 64 ... fp[7] 71

This has been successfully tested with bnx2x, qlcnic and netxen network
cards as slaves in a bond interface. Turning LRO on or off on the master
also turns it on or off on each of the slaves, new slaves are added with
LRO in the same state as the master, and LRO can't be toggled on the
slaves.

Also, this should largely remove the need for dev_disable_lro(), and most,
if not all, of its call sites can be replaced by simply making sure
NETIF_F_LRO isn't included in the relevant device's feature flags.

Note that this patch is driven by bug reports from users saying it was
confusing that bonds and slaves had different settings for the same
features, and while it won't be 100% in sync if a lower device doesn't
support a feature like LRO, I think this is a good step in the right
direction.

CC: "David S. Miller" <davem@davemloft.net>
CC: Eric Dumazet <edumazet@google.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <gospo@cumulusnetworks.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Nikolay Aleksandrov <razor@blackwall.org>
CC: Michal Kubecek <mkubecek@suse.cz>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 23:41:31 -05:00
Eric Dumazet 4ece900977 sit: fix sit0 percpu double allocations
sit0 device allocates its percpu storage twice :
- One time in ipip6_tunnel_init()
- One time in ipip6_fb_tunnel_init()

Thus we leak 48 bytes per possible cpu per network namespace dismantle.

ipip6_fb_tunnel_init() can be much simpler and does not
return an error, and should be called after register_netdev()

Note that ipip6_tunnel_clone_6rd() also needs to be called
after register_netdev() (calling ipip6_tunnel_init())

Fixes: ebe084aafb ("sit: Use ipip6_tunnel_init as the ndo_init function.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 22:54:45 -05:00
Eric Dumazet 1d6119baf0 net: fix percpu memory leaks
This patch fixes following problems :

1) percpu_counter_init() can return an error, therefore
  init_frag_mem_limit() must propagate this error so that
  inet_frags_init_net() can do the same up to its callers.

2) If ip[46]_frags_ns_ctl_register() fail, we must unwind
   properly and free the percpu_counter.

Without this fix, we leave freed object in percpu_counters
global list (if CONFIG_HOTPLUG_CPU) leading to crashes.

This bug was detected by KASAN and syzkaller tool
(http://github.com/google/syzkaller)

Fixes: 6d7b857d54 ("net: use lib/percpu_counter API for fragmentation mem accounting")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 22:47:14 -05:00
Ilya Dryomov 583d0fef75 libceph: clear msg->con in ceph_msg_release() only
The following bit in ceph_msg_revoke_incoming() is unsafe:

    struct ceph_connection *con = msg->con;
    if (!con)
            return;
    mutex_lock(&con->mutex);
    <more msg->con use>

There is nothing preventing con from getting destroyed right after
msg->con test.  One easy way to reproduce this is to disable message
signing only on the server side and try to map an image.  The system
will go into a

    libceph: read_partial_message ffff880073f0ab68 signature check failed
    libceph: osd0 192.168.255.155:6801 bad crc/signature
    libceph: read_partial_message ffff880073f0ab68 signature check failed
    libceph: osd0 192.168.255.155:6801 bad crc/signature

loop which has to be interrupted with Ctrl-C.  Hit Ctrl-C and you are
likely to end up with a random GP fault if the reset handler executes
"within" ceph_msg_revoke_incoming():

                     <yet another reply w/o a signature>
                                   ...
          <Ctrl-C>
    rbd_obj_request_end
      ceph_osdc_cancel_request
        __unregister_request
          ceph_osdc_put_request
            ceph_msg_revoke_incoming
                                   ...
                                osd_reset
                                  __kick_osd_requests
                                    __reset_osd
                                      remove_osd
                                        ceph_con_close
                                          reset_connection
                                            <clear con->in_msg->con>
                                            <put con ref>
                                              put_osd
                                                <free osd/con>
              <msg->con use> <-- !!!

If ceph_msg_revoke_incoming() executes "before" the reset handler,
osd/con will be leaked because ceph_msg_revoke_incoming() clears
con->in_msg but doesn't put con ref, while reset_connection() only puts
con ref if con->in_msg != NULL.

The current msg->con scheme was introduced by commits 38941f8031
("libceph: have messages point to their connection") and 92ce034b5a
("libceph: have messages take a connection reference"), which defined
when messages get associated with a connection and when that
association goes away.  Part of the problem is that this association is
supposed to go away in much too many places; closing this race entirely
requires either a rework of the existing or an addition of a new layer
of synchronization.

In lieu of that, we can make it *much* less likely to hit by
disassociating messages only on their destruction and resend through
a different connection.  This makes the code simpler and is probably
a good thing to do regardless - this patch adds a msg_con_set() helper
which is is called from only three places: ceph_con_send() and
ceph_con_in_msg_alloc() to set msg->con and ceph_msg_release() to clear
it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-11-02 23:37:46 +01:00
Ilya Dryomov a51983e4dd libceph: add nocephx_sign_messages option
Support for message signing was merged into 3.19, along with
nocephx_require_signatures option.  But, all that option does is allow
the kernel client to talk to clusters that don't support MSG_AUTH
feature bit.  That's pretty useless, given that it's been supported
since bobtail.

Meanwhile, if one disables message signing on the server side with
"cephx sign messages = false", it becomes impossible to use the kernel
client since it expects messages to be signed if MSG_AUTH was
negotiated.  Add nocephx_sign_messages option to support this use case.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-11-02 23:37:46 +01:00
Ilya Dryomov 859bff51dc libceph: stop duplicating client fields in messenger
supported_features and required_features serve no purpose at all, while
nocrc and tcp_nodelay belong to ceph_options::flags.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-11-02 23:37:46 +01:00
Ilya Dryomov 4199b8eec3 libceph: drop authorizer check from cephx msg signing routines
I don't see a way for auth->authorizer to be NULL in
ceph_x_sign_message() or ceph_x_check_message_signature().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-11-02 23:37:46 +01:00
Ilya Dryomov 79dbd1baa6 libceph: msg signing callouts don't need con argument
We can use msg->con instead - at the point we sign an outgoing message
or check the signature on the incoming one, msg->con is always set.  We
wouldn't know how to sign a message without an associated session (i.e.
msg->con == NULL) and being able to sign a message using an explicitly
provided authorizer is of no use.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-11-02 23:37:45 +01:00
Ioana Ciornei 8a703a383d libceph: evaluate osd_req_op_data() arguments only once
This patch changes the osd_req_op_data() macro to not evaluate
arguments more than once in order to follow the kernel coding style.

Signed-off-by: Ioana Ciornei <ciorneiioana@gmail.com>
Reviewed-by: Alex Elder <elder@linaro.org>
[idryomov@gmail.com: changelog, formatting]
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-11-02 23:36:49 +01:00
Ilya Dryomov cbf99a11fb libceph: introduce ceph_x_authorizer_cleanup()
Commit ae385eaf24 ("libceph: store session key in cephx authorizer")
introduced ceph_x_authorizer::session_key, but didn't update all the
exit/error paths.  Introduce ceph_x_authorizer_cleanup() to encapsulate
ceph_x_authorizer cleanup and switch to it.  This fixes ceph_x_destroy(),
which currently always leaks key and ceph_x_build_authorizer() error
paths.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
2015-11-02 23:36:48 +01:00
Shraddha Barke 343128ce91 libceph: use local variable cursor instead of &msg->cursor
Use local variable cursor in place of &msg->cursor in
read_partial_msg_data() and write_partial_msg_data().

Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-11-02 23:36:47 +01:00
Shraddha Barke 70cf052d0c libceph: remove con argument in handle_reply()
Since handle_reply() does not use its con argument, remove it.

Signed-off-by: Shraddha Barke <shraddha.6596@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-11-02 23:36:47 +01:00
Trond Myklebust ac3c860c75 NFS: NFSoRDMA Client Side Changes
In addition to a variety of bugfixes, these patches are mostly geared at
 enabling both swap and backchannel support to the NFS over RDMA client.
 
 Signed-off-by: Anna Schumake <Anna.Schumaker@Netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWN9tvAAoJENfLVL+wpUDrurkP/0exWvxZb0yAxOlquyh4tmUA
 ZO2rd+aap9iyaOPYGcWGd38x3WuvoecuaT/Eu+wRGkH89sF1LMSA+GUD7Ua/Ii7r
 5spQP6tVRVswr+cK53H3fbEpQE7NTuBJB4RjivmddmduMPy678FcMSg4wfMqGwmw
 bFuCG70bYkEboIe+jiqNOzy6+Dkkn6h4pLg8S89jGj4XeV7JF9l7Cr0OfxZVWxme
 YX1y9lyIMB/dKsD8o2TjhfeSQ1TtmWDS1rw7MurIF/pIlmvTfAoivZFfflrAbOC6
 vx/wWsswLKZPJ72QrXfnRErEI+8nea5mvBvgW2xQh1GywWQI5kzdvG3lVMmvjX3I
 g5X/e6oDaPAtBXuzundQP7vE3yYTGGH+C0rBoFRHR5ThuRZyNqQY0VphQ/nz+B6b
 m5loQaxKy+qDdNH0sTwaY3KUNoP4LHzMF+15g2nVIjKLZlG+7Yx8yJwhkKx4XXzn
 t8opIcLSNb6ehlQ/Vw3smhjc6NAXecg0jEeGkL1MV0Cqpk+Uyf1JFNyDL/nJkeI+
 3zlmVDIIbPCHz7gmqhlXCN6Ql6QttgGyt5mgW0f6Q1N0Miqix6DCywu9aaprLZPJ
 O+MOZaNa/6F0KSZpPTwqZ5i7nxrBu48r8OK0HDU7FOdJ1CZXd7y7TXrXnBVco4uu
 AXVsLy/tnjAlqOy07ibB
 =Ush5
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-4.4-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma

NFS: NFSoRDMA Client Side Changes

In addition to a variety of bugfixes, these patches are mostly geared at
enabling both swap and backchannel support to the NFS over RDMA client.

Signed-off-by: Anna Schumake <Anna.Schumaker@Netapp.com>
2015-11-02 17:09:24 -05:00
Matthias Schiffer ec13ad1d70 ipv6: fix crash on ICMPv6 redirects with prohibited/blackholed source
There are other error values besides ip6_null_entry that can be returned by
ip6_route_redirect(): fib6_rule_action() can also result in
ip6_blk_hole_entry and ip6_prohibit_entry if such ip rules are installed.

Only checking for ip6_null_entry in rt6_do_redirect() causes ip6_ins_rt()
to be called with rt->rt6i_table == NULL in these cases, making the kernel
crash.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 16:30:15 -05:00
Chuck Lever 76566773a1 NFS: Enable client side NFSv4.1 backchannel to use other transports
Forechannel transports get their own "bc_up" method to create an
endpoint for the backchannel service.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[Anna Schumaker: Add forward declaration of struct net to xprt.h]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 16:29:13 -05:00
Eric Dumazet 9e17f8a475 net: make skb_set_owner_w() more robust
skb_set_owner_w() is called from various places that assume
skb->sk always point to a full blown socket (as it changes
sk->sk_wmem_alloc)

We'd like to attach skb to request sockets, and in the future
to timewait sockets as well. For these kind of pseudo sockets,
we need to take a traditional refcount and use sock_edemux()
as the destructor.

It is now time to un-inline skb_set_owner_w(), being too big.

Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Bisected-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 16:28:49 -05:00
Ido Schimmel eca1e006cf bridge: vlan: Use rcu_dereference instead of rtnl_dereference
br_should_learn() is protected by RCU and not by RTNL, so use correct
flavor of nbp_vlan_group().

Fixes: 907b1e6e83 ("bridge: vlan: use proper rcu for the vlgrp
member")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 16:27:39 -05:00
Ani Sinha 44f49dd8b5 ipmr: fix possible race resulting from improper usage of IP_INC_STATS_BH() in preemptible context.
Fixes the following kernel BUG :

BUG: using __this_cpu_add() in preemptible [00000000] code: bash/2758
caller is __this_cpu_preempt_check+0x13/0x15
CPU: 0 PID: 2758 Comm: bash Tainted: P           O   3.18.19 #2
 ffffffff8170eaca ffff880110d1b788 ffffffff81482b2a 0000000000000000
 0000000000000000 ffff880110d1b7b8 ffffffff812010ae ffff880007cab800
 ffff88001a060800 ffff88013a899108 ffff880108b84240 ffff880110d1b7c8
Call Trace:
[<ffffffff81482b2a>] dump_stack+0x52/0x80
[<ffffffff812010ae>] check_preemption_disabled+0xce/0xe1
[<ffffffff812010d4>] __this_cpu_preempt_check+0x13/0x15
[<ffffffff81419d60>] ipmr_queue_xmit+0x647/0x70c
[<ffffffff8141a154>] ip_mr_forward+0x32f/0x34e
[<ffffffff8141af76>] ip_mroute_setsockopt+0xe03/0x108c
[<ffffffff810553fc>] ? get_parent_ip+0x11/0x42
[<ffffffff810e6974>] ? pollwake+0x4d/0x51
[<ffffffff81058ac0>] ? default_wake_function+0x0/0xf
[<ffffffff810553fc>] ? get_parent_ip+0x11/0x42
[<ffffffff810613d9>] ? __wake_up_common+0x45/0x77
[<ffffffff81486ea9>] ? _raw_spin_unlock_irqrestore+0x1d/0x32
[<ffffffff810618bc>] ? __wake_up_sync_key+0x4a/0x53
[<ffffffff8139a519>] ? sock_def_readable+0x71/0x75
[<ffffffff813dd226>] do_ip_setsockopt+0x9d/0xb55
[<ffffffff81429818>] ? unix_seqpacket_sendmsg+0x3f/0x41
[<ffffffff813963fe>] ? sock_sendmsg+0x6d/0x86
[<ffffffff813959d4>] ? sockfd_lookup_light+0x12/0x5d
[<ffffffff8139650a>] ? SyS_sendto+0xf3/0x11b
[<ffffffff810d5738>] ? new_sync_read+0x82/0xaa
[<ffffffff813ddd19>] compat_ip_setsockopt+0x3b/0x99
[<ffffffff813fb24a>] compat_raw_setsockopt+0x11/0x32
[<ffffffff81399052>] compat_sock_common_setsockopt+0x18/0x1f
[<ffffffff813c4d05>] compat_SyS_setsockopt+0x1a9/0x1cf
[<ffffffff813c4149>] compat_SyS_socketcall+0x180/0x1e3
[<ffffffff81488ea1>] cstar_dispatch+0x7/0x1e

Signed-off-by: Ani Sinha <ani@arista.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 15:57:12 -05:00
Ido Schimmel ddd611d3ff bridge: vlan: Use correct flag name in comment
The flag used to indicate if a VLAN should be used for filtering - as
opposed to context only - on the bridge itself (e.g. br0) is called
'brentry' and not 'brvlan'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 15:40:11 -05:00
Ido Schimmel 07bc588fc1 bridge: vlan: Prevent possible use-after-free
When adding a port to a bridge we initialize VLAN filtering on it. We do
not bail out in case an error occurred in nbp_vlan_init, as it can be
used as a non VLAN filtering bridge.

However, if VLAN filtering is required and an error occurred in
nbp_vlan_init, we should set vlgrp to NULL, so that VLAN filtering
functions (e.g. br_vlan_find, br_get_pvid) will know the struct is
invalid and will not try to access it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 15:40:10 -05:00
Eric Dumazet ce1050089c tcp/dccp: fix ireq->pktopts race
IPv6 request sockets store a pointer to skb containing the SYN packet
to be able to transfer it to full blown socket when 3WHS is done
(ireq->pktopts -> np->pktoptions)

As explained in commit 5e0724d027 ("tcp/dccp: fix hashdance race for
passive sessions"), we must transfer the skb only if we won the
hashdance race, if multiple cpus receive the 'ack' packet completing
3WHS at the same time.

Fixes: e994b2f0fb ("tcp: do not lock listener to process SYN packets")
Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 15:38:26 -05:00
santosh.shilimkar@oracle.com 7b5654349e RDS: convert bind hash table to re-sizable hashtable
To further improve the RDS connection scalabilty on massive systems
where number of sockets grows into tens of thousands  of sockets, there
is a need of larger bind hashtable. Pre-allocated 8K or 16K table is
not very flexible in terms of memory utilisation. The rhashtable
infrastructure gives us the flexibility to grow the hashtbable based
on use and also comes up with inbuilt efficient bucket(chain) handling.

Reviewed-by: David Miller <davem@davemloft.net>
Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 15:36:23 -05:00
Saurabh Sengar d3ffaefa1b net: rds: changing the return type from int to void
as result of function rds_iw_flush_mr_pool is nowhere checked,
changing its return type from int to void.
also removing the unused variable rc as there is nothing to return

Signed-off-by: Saurabh Sengar <saurabh.truth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 15:35:19 -05:00
Paolo Abeni 9920e48b83 ipv4: use l4 hash for locally generated multipath flows
This patch changes how the multipath hash is computed for locally
generated flows: now the hash comprises l4 information.

This allows better utilization of the available paths when the existing
flows have the same source IP and the same destination IP: with l3 hash,
even when multiple connections are in place simultaneously, a single path
will be used, while with l4 hash we can use all the available paths.

v2 changes:
- use get_hash_from_flowi4() instead of implementing just another l4 hash
  function

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 14:38:43 -05:00
Chuck Lever 0f2e3bdab6 SUNRPC: Remove the TCP-only restriction in bc_svc_process()
Allow the use of other transport classes when handling a backward
direction RPC call.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever 9468431962 svcrdma: Add backward direction service for RPC/RDMA transport
On NFSv4.1 mount points, the Linux NFS client uses this transport
endpoint to receive backward direction calls and route replies back
to the NFSv4.1 server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever 63cae47005 xprtrdma: Handle incoming backward direction RPC calls
Introduce a code path in the rpcrdma_reply_handler() to catch
incoming backward direction RPC calls and route them to the ULP's
backchannel server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever 83128a60ca xprtrdma: Add support for sending backward direction RPC replies
Backward direction RPC replies are sent via the client transport's
send_request method, the same way forward direction RPC calls are
sent.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever 124fa17d3e xprtrdma: Pre-allocate Work Requests for backchannel
Pre-allocate extra send and receive Work Requests needed to handle
backchannel receives and sends.

The transport doesn't know how many extra WRs to pre-allocate until
the xprt_setup_backchannel() call, but that's long after the WRs are
allocated during forechannel setup.

So, use a fixed value for now.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever f531a5dbc4 xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers
xprtrdma's backward direction send and receive buffers are the same
size as the forechannel's inline threshold, and must be pre-
registered.

The consumer has no control over which receive buffer the adapter
chooses to catch an incoming backwards-direction call. Any receive
buffer can be used for either a forward reply or a backward call.
Thus both types of RPC message must all be the same size.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever 42e5c3e272 SUNRPC: Abstract backchannel operations
xprt_{setup,destroy}_backchannel() won't be adequate for RPC/RMDA
bi-direction. In particular, receive buffers have to be pre-
registered and posted in order to receive incoming backchannel
requests.

Add a virtual function call to allow the insertion of appropriate
backchannel setup and destruction methods for each transport.

In addition, freeing a backchannel request is a little different
for RPC/RDMA. Introduce an rpc_xprt_op to handle the difference.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever a5b027e189 xprtrdma: Saving IRQs no longer needed for rb_lock
Now that RPC replies are processed in a workqueue, there's no need
to disable IRQs when managing send and receive buffers. This saves
noticeable overhead per RPC.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever 2da9ab3008 xprtrdma: Remove reply tasklet
Clean up: The reply tasklet is no longer used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever fe97b47cd6 xprtrdma: Use workqueue to process RPC/RDMA replies
The reply tasklet is fast, but it's single threaded. After reply
traffic saturates a single CPU, there's no more reply processing
capacity.

Replace the tasklet with a workqueue to spread reply handling across
all CPUs.  This also moves RPC/RDMA reply handling out of the soft
IRQ context and into a context that allows sleeps.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever 1e465fd4ff xprtrdma: Replace send and receive arrays
The rb_send_bufs and rb_recv_bufs arrays are used to implement a
pair of stacks for keeping track of free rpcrdma_req and rpcrdma_rep
structs. Replace those arrays with free lists.

To allow more than 512 RPCs in-flight at once, each of these arrays
would be larger than a page (assuming 8-byte addresses and 4KB
pages). Allowing up to 64K in-flight RPCs (as TCP now does), each
buffer array would have to be 128 pages. That's an order-6
allocation. (Not that we're going there.)

A list is easier to expand dynamically. Instead of allocating a
larger array of pointers and copying the existing pointers to the
new array, simply append more buffers to each list.

This also makes it simpler to manage receive buffers that might
catch backwards-direction calls, or to post receive buffers in
bulk to amortize the overhead of ib_post_recv.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever b0e178a2d8 xprtrdma: Refactor reply handler error handling
Clean up: The error cases in rpcrdma_reply_handler() almost never
execute. Ensure the compiler places them out of the hot path.

No behavior change expected.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever 4220a07264 xprtrdma: Prevent loss of completion signals
Commit 8301a2c047 ("xprtrdma: Limit work done by completion
handler") was supposed to prevent xprtrdma's upcall handlers from
starving other softIRQ work by letting them return to the provider
before all CQEs have been polled.

The logic assumes the provider will call the upcall handler again
immediately if the CQ is re-armed while there are still queued CQEs.

This assumption is invalid. The IBTA spec says that after a CQ is
armed, the hardware must interrupt only when a new CQE is inserted.
xprtrdma can't rely on the provider calling again, even though some
providers do.

Therefore, leaving CQEs on queue makes sense only when there is
another mechanism that ensures all remaining CQEs are consumed in a
timely fashion. xprtrdma does not have such a mechanism. If a CQE
remains queued, the transport can wait forever to send the next RPC.

Finally, move the wcs array back onto the stack to ensure that the
poll array is always local to the CPU where the completion upcall is
running.

Fixes: 8301a2c047 ("xprtrdma: Limit work done by completion ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever 7b3d770c67 xprtrdma: Re-arm after missed events
ib_req_notify_cq(IB_CQ_REPORT_MISSED_EVENTS) returns a positive
value if WCs were added to a CQ after the last completion upcall
but before the CQ has been re-armed.

Commit 7f23f6f6e3 ("xprtrmda: Reduce lock contention in
completion handlers") assumed that when ib_req_notify_cq() returned
a positive RC, the CQ had also been successfully re-armed, making
it safe to return control to the provider without losing any
completion signals. That is an invalid assumption.

Change both completion handlers to continue polling while
ib_req_notify_cq() returns a positive value.

Fixes: 7f23f6f6e3 ("xprtrmda: Reduce lock contention in ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Chuck Lever a045178887 xprtrdma: Enable swap-on-NFS/RDMA
After adding a swapfile on an NFS/RDMA mount and removing the
normal swap partition, I was able to push the NFS client well
into swap without any issue.

I forgot to swapoff the NFS file before rebooting. This pinned
the NFS mount and the IB core and provider, causing shutdown to
hang. I think this is expected and safe behavior. Probably
shutdown scripts should "swapoff -a" before unmounting any
filesystems.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Tested-By: Devesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Steve Wise 8610586d82 xprtrdma: don't log warnings for flushed completions
Unsignaled send WRs can get flushed as part of normal unmount, so don't
log them as warnings.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2015-11-02 13:45:15 -05:00
Tina Ruchandani 1032a66871 Use 64-bit timekeeping
This patch changes the use of struct timespec in
dccp_probe to use struct timespec64 instead. timespec uses a 32-bit
seconds field which will overflow in the year 2038 and beyond. timespec64
uses a 64-bit seconds field. Note that the correctness of the code isn't
changed, since the original code only uses the timestamps to compute a
small elapsed interval. This patch is part of a larger attempt to remove
instances of 32-bit timekeeping structures (timespec, timeval, time_t)
from the kernel so it is easier to identify where the real 2038 issues
are.

Signed-off-by: Tina Ruchandani <ruchandani.tina@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 17:01:16 -05:00
Julian Anastasov c9b3292eeb ipv4: update RTNH_F_LINKDOWN flag on UP event
When nexthop is part of multipath route we should clear the
LINKDOWN flag when link goes UP or when first address is added.
This is needed because we always set LINKDOWN flag when DEAD flag
was set but now on UP the nexthop is not dead anymore. Examples when
LINKDOWN bit can be forgotten when no NETDEV_CHANGE is delivered:

- link goes down (LINKDOWN is set), then link goes UP and device
shows carrier OK but LINKDOWN remains set

- last address is deleted (LINKDOWN is set), then address is
added and device shows carrier OK but LINKDOWN remains set

Steps to reproduce:
modprobe dummy
ifconfig dummy0 192.168.168.1 up

here add a multipath route where one nexthop is for dummy0:

ip route add 1.2.3.4 nexthop dummy0 nexthop SOME_OTHER_DEVICE
ifconfig dummy0 down
ifconfig dummy0 up

now ip route shows nexthop that is not dead. Now set the sysctl var:

echo 1 > /proc/sys/net/ipv4/conf/dummy0/ignore_routes_with_linkdown

now ip route will show a dead nexthop because the forgotten
RTNH_F_LINKDOWN is propagated as RTNH_F_DEAD.

Fixes: 8a3d03166f ("net: track link-status of ipv4 nexthops")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 16:57:39 -05:00
Julian Anastasov 4f823defdd ipv4: fix to not remove local route on link down
When fib_netdev_event calls fib_disable_ip on NETDEV_DOWN event
we should not delete the local routes if the local address
is still present. The confusion comes from the fact that both
fib_netdev_event and fib_inetaddr_event use the NETDEV_DOWN
constant. Fix it by returning back the variable 'force'.

Steps to reproduce:
modprobe dummy
ifconfig dummy0 192.168.168.1 up
ifconfig dummy0 down
ip route list table local | grep dummy | grep host
local 192.168.168.1 dev dummy0  proto kernel  scope host  src 192.168.168.1

Fixes: 8a3d03166f ("net: track link-status of ipv4 nexthops")
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 16:57:39 -05:00
Vivien Didelot 76e398a627 net: dsa: use switchdev obj for VLAN add/del ops
Simplify DSA by pushing the switchdev objects for VLAN add and delete
operations down to its drivers. Currently only mv88e6xxx is affected.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 15:56:11 -05:00
Stefan Hajnoczi ea3803c193 VSOCK: define VSOCK_SS_LISTEN once only
The SS_LISTEN socket state is defined by both af_vsock.c and
vmci_transport.c.  This is risky since the value could be changed in one
file and the other would be out of sync.

Rename from SS_LISTEN to VSOCK_SS_LISTEN since the constant is not part
of enum socket_state (SS_CONNECTED, ...).  This way it is clear that the
constant is vsock-specific.

The big text reflow in af_vsock.c was necessary to keep to the maximum
line length.  Text is unchanged except for s/SS_LISTEN/VSOCK_SS_LISTEN/.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:14:47 -05:00
Jon Paul Maloy 5cbb28a4bf tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers
Testing of the new UDP bearer has revealed that reception of
NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE
message buffers is not prepared for the case that those may be
non-linear.

We now linearize all such buffers before they are delivered up to the
generic reception layer.

In order for the commit to apply cleanly to 'net' and 'stable', we do
the change in the function tipc_udp_recv() for now. Later, we will post
a commit to 'net-next' moving the linearization to generic code, in
tipc_named_rcv() and tipc_link_proto_rcv().

Fixes: commit d0f91938be ("tipc: add ip/udp media type")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:04:29 -05:00
Hannes Frederic Sowa 405c92f7a5 ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment
CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
of those warn about them once and handle them gracefully by recalculating
the checksum.

Fixes: commit 32dce968dd ("ipv6: Allow for partial checksums on non-ufo packets")
See-also: commit 72e843bb09 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:01:28 -05:00
Hannes Frederic Sowa 682b1a9d3f ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in
the existence of IPv6 extension headers, but the condition was wrong. Fix
it up, too. Also the condition to check whether the packet fits into
one fragment was wrong and has been corrected.

Fixes: commit 32dce968dd ("ipv6: Allow for partial checksums on non-ufo packets")
See-also: commit 72e843bb09 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:01:27 -05:00
Hannes Frederic Sowa dbd3393c56 ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment
CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
of those warn about them once and handle them gracefully by recalculating
the checksum.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:01:27 -05:00
Hannes Frederic Sowa d749c9cbff ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:01:27 -05:00
David S. Miller b75ec3af27 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-11-01 00:15:30 -04:00
David S. Miller e7b63ff115 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2015-10-30

1) The flow cache is limited by the flow cache limit which
   depends on the number of cpus and the xfrm garbage collector
   threshold which is independent of the number of cpus. This
   leads to the fact that on systems with more than 16 cpus
   we hit the xfrm garbage collector limit and refuse new
   allocations, so new flows are dropped. On systems with 16
   or less cpus, we hit the flowcache limit. In this case, we
   shrink the flow cache instead of refusing new flows.

   We increase the xfrm garbage collector threshold to INT_MAX
   to get the same behaviour, independent of the number of cpus.

2) Fix some unaligned accesses on sparc systems.
   From Sowmini Varadhan.

3) Fix some header checks in _decode_session4. We may call
   pskb_may_pull with a negative value converted to unsigened
   int from pskb_may_pull. This can lead to incorrect policy
   lookups. We fix this by a check of the data pointer position
   before we call pskb_may_pull.

4) Reload skb header pointers after calling pskb_may_pull
   in _decode_session4 as this may change the pointers into
   the packet.

5) Add a missing statistic counter on inner mode errors.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 20:51:56 +09:00
Scott Feldman e258d919b1 switchdev: fix: pass correct obj size when deferring obj add
Fixes: 4d429c5dd ("switchdev: introduce possibility to defer obj_add/del")
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 20:23:37 +09:00
Scott Feldman 3a7bde55a1 switchdev: fix: erasing too much of vlan obj when handling multiple vlan specs
When adding vlans with multiple IFLA_BRIDGE_VLAN_INFO attrs set in AFSPEC,
we would wipe the vlan obj struct after the first IFLA_BRIDGE_VLAN_INFO.
Fix this by only clearing what's necessary on each IFLA_BRIDGE_VLAN_INFO
iteration.

Fixes: 9e8f4a54 ("switchdev: push object ID back to object structure")
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 20:23:35 +09:00
David S. Miller 740215ddb5 NFC 4.4 pull request
This is the NFC pull request for 4.4.
 
 It's a bit bigger than usual, the 3 main culprits being:
 
 - A new driver for Intel's Fields Peak NCI chipset. In order to
   support this chipset we had to export a few NCI routines and
   extend the driver NCI ops to not only support proprietary
   commands but also core ones.
 
 - Support for vendor commands for both STM drivers, st-nci
   and st21nfca. Those vendor commands allow to run factory tests
   through the NFC netlink interface.
 
 - New i2c and SPI support for the Marvell driver, together with
   firmware download support for this driver's core.
 
 Besides that we also have:
 
 - A few file renames in the STM drivers, to keep the naming
   consistent between drivers.
 
 - Some improvements and fixes on the NCI HCI layer, mostly to
   properly reach a secure element over a legacy HCI link.
 
 - A few fixes for the s3fwrn5 and trf7970a drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWMGIaAAoJEIqAPN1PVmxKHjYP/3Q3Y4Vhvw3kTfDP3IlnAuH3
 XjBMGKPLu72MmtSk9jFOr5VuC76YtJzwf+4nKGJybu619NPKfxXN7r83bpsZV1Bk
 +0cS1RpjIQh92a0ElvX1muCFhgH7ax7zeqQ+29OSpA33e67/DlUcwxiqzF15cwWC
 Bk0pUv1FxMoNi5ZkG1JrRqrhx/Yqo1dw2HrnMbKVgwLtLzODBuzoGVKfydTo0b1j
 hkl30DPF3AYMxnwIml3tM8zT96b1LtD0Xgs1yF8IdrIJ+6YLn/6tnw1rUxnE8Ovo
 JFtvtS0OKjGTNFr1NhueG0i5td8TR4MAJKHh0Lz9ISIFWtNtsVjFfGuAeWZcQ/n/
 rQS7xOvzBCndNbe8PS9wBiNQAqLAH5/dzvKRwNboRttkpwIrNOgaBYj2LpuRzpfO
 p+ArwBryAorfxQVOIWl4knc59UsiPUKOK61uMTZ1sU7jCEvUNVChIm8EGRlMnpMQ
 ZFlBa2lqNdgz7ubKLofbnWLiCNY6r0E13MSLHZlJZX61IMjs13ojDeKMvitFBe+b
 1hDwbSWxIRB8xKbcsIA9bPUnEc16Syywz/Q4iAsE8Gy6l5J41MhA/q2QaO9WSrPE
 Leah53l5EwQRd55WjJkCkIKZwvCjIerkESfAS0oprELIYXaxs/1PbVl6C7VYZA4K
 A5tYLw2vS+tTK4Mgi/ym
 =aw/h
 -----END PGP SIGNATURE-----

Merge tag 'nfc-next-4.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next

Samuel Ortiz says:

====================
NFC 4.4 pull request

This is the NFC pull request for 4.4.

It's a bit bigger than usual, the 3 main culprits being:

- A new driver for Intel's Fields Peak NCI chipset. In order to
  support this chipset we had to export a few NCI routines and
  extend the driver NCI ops to not only support proprietary
  commands but also core ones.

- Support for vendor commands for both STM drivers, st-nci
  and st21nfca. Those vendor commands allow to run factory tests
  through the NFC netlink interface.

- New i2c and SPI support for the Marvell driver, together with
  firmware download support for this driver's core.

Besides that we also have:

- A few file renames in the STM drivers, to keep the naming
  consistent between drivers.

- Some improvements and fixes on the NCI HCI layer, mostly to
  properly reach a secure element over a legacy HCI link.

- A few fixes for the s3fwrn5 and trf7970a drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 20:19:43 +09:00
David S. Miller 5bf8921116 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2015-10-28

Here are a some more Bluetooth patches for 4.4 which collected up during
the past week. The most important ones are from Kuba Pawlak for fixing
locking issues with SCO sockets. There's also a fix from Alexander Aring
for 6lowpan, a memleak fix from Julia Lawall for the btmrvl driver and
some cleanup patches from Marcel.

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 19:41:10 +09:00
Alexander Duyck b7b0b1d290 ipv6: recreate ipv6 link-local addresses when increasing MTU over IPV6_MIN_MTU
This change makes it so that we reinitialize the interface if the MTU is
increased back above IPV6_MIN_MTU and the interface is up.

Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 18:11:07 +09:00
Ido Schimmel 741af0053b switchdev: Add support for flood control
Allow devices supporting this feature to control the flooding of unknown
unicast traffic, by making switchdev infrastructure propagate this setting
to the switch driver.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 12:26:38 +09:00
Roopa Prabhu b7af1472af bridge: set is_local and is_static before fdb entry is added to the fdb hashtable
Problem Description:
We can add fdbs pointing to the bridge with NULL ->dst but that has a
few race conditions because br_fdb_insert() is used which first creates
the fdb and then, after the fdb has been published/linked, sets
"is_local" to 1 and in that time frame if a packet arrives for that fdb
it may see it as non-local and either do a NULL ptr dereference in
br_forward() or attach the fdb to the port where it arrived, and later
br_fdb_insert() will make it local thus getting a wrong fdb entry.
Call chain br_handle_frame_finish() -> br_forward():
But in br_handle_frame_finish() in order to call br_forward() the dst
should not be local i.e. skb != NULL, whenever the dst is
found to be local skb is set to NULL so we can't forward it,
and here comes the problem since it's running only
with RCU when forwarding packets it can see the entry before "is_local"
is set to 1 and actually try to dereference NULL.
The main issue is that if someone sends a packet to the switch while
it's adding the entry which points to the bridge device, it may
dereference NULL ptr. This is needed now after we can add fdbs
pointing to the bridge.  This poses a problem for
br_fdb_update() as well, while someone's adding a bridge fdb, but
before it has is_local == 1, it might get moved to a port if it comes
as a source mac and then it may get its "is_local" set to 1

This patch changes fdb_create to take is_local and is_static as
arguments to set these values in the fdb entry before it is added to the
hash. Also adds null check for port in br_forward.

Fixes: 3741873b4f ("bridge: allow adding of fdb entries pointing to the bridge device")
Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-30 12:13:05 +09:00
Hannes Frederic Sowa 89bc7848a9 ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues
Raw sockets with hdrincl enabled can insert ipv6 extension headers
right into the data stream. In case we need to fragment those packets,
we reparse the options header to find the place where we can insert
the fragment header. If the extension headers exceed the link's MTU we
actually cannot make progress in such a case.

Instead of ending up in broken arithmetic or rounding towards 0 and
entering an endless loop in ip6_fragment, just prevent those cases by
aborting early and signal -EMSGSIZE to user space.

This is the second version of the patch which doesn't use the
overflow_usub function, which got reverted for now.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-29 07:01:50 -07:00
Hannes Frederic Sowa 1e0d69a9cc Revert "Merge branch 'ipv6-overflow-arith'"
Linus dislikes these changes. To not hold up the net-merge let's revert
it for now and fix the bug like Linus suggested.

This reverts commit ec3661b422, reversing
changes made to c80dbe0461.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-29 07:01:48 -07:00
Sagi Grimberg 9ddc87374a RDS/IW: Convert to new memory registration API
Get rid of fast_reg page list and its construction.
Instead, just pass the RDS sg list to ib_map_mr_sg
and post the new ib_reg_wr.

This is done both for server IW RDMA_READ registration
and the client remote key registration.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:18 -04:00
Sagi Grimberg 412a15c0fe svcrdma: Port to new memory registration API
Instead of maintaining a fastreg page list, keep an sg table
and convert an array of pages to a sg list. Then call ib_map_mr_sg
and construct ib_reg_wr.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Selvin Xavier <selvin.xavier@avagotech.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:18 -04:00
Sagi Grimberg 4143f34e01 xprtrdma: Port to new memory registration API
Instead of maintaining a fastreg page list, keep an sg table
and convert an array of pages to a sg list. Then call ib_map_mr_sg
and construct ib_reg_wr.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Tested-by: Selvin Xavier <selvin.xavier@avagotech.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 22:27:18 -04:00
Doug Ledford 63e8790d39 Merge branch 'wr-cleanup' into k.o/for-4.4 2015-10-28 22:23:34 -04:00
Doug Ledford eb14ab3ba1 Merge branch 'wr-cleanup' of git://git.infradead.org/users/hch/rdma into wr-cleanup
Signed-off-by: Doug Ledford <dledford@redhat.com>

Conflicts:
	drivers/infiniband/ulp/isert/ib_isert.c - Commit 4366b19ca5
	(iser-target: Change the recv buffers posting logic) changed the
	logic in isert_put_datain() and had to be hand merged
2015-10-28 22:21:09 -04:00
Guy Shapiro fa20105e09 IB/cma: Add support for network namespaces
Add support for network namespaces in the ib_cma module. This is
accomplished by:

1. Adding network namespace parameter for rdma_create_id. This parameter is
   used to populate the network namespace field in rdma_id_private.
   rdma_create_id keeps a reference on the network namespace.
2. Using the network namespace from the rdma_id instead of init_net inside
   of ib_cma, when listening on an ID and when looking for an ID for an
   incoming request.
3. Decrementing the reference count for the appropriate network namespace
   when calling rdma_destroy_id.

In order to preserve the current behavior init_net is passed when calling
from other modules.

Signed-off-by: Guy Shapiro <guysh@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Yotam Kenneth <yotamke@mellanox.com>
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-28 12:32:48 -04:00
Robert Dolca f11631748e NFC: nci: non-static functions can not be inline
This fixes a build error that seems to be toochain
dependent (Not seen with gcc v5.1):

In file included from net/nfc/nci/rsp.c:36:0:
net/nfc/nci/rsp.c: In function ‘nci_rsp_packet’:
include/net/nfc/nci_core.h:355:12: error: inlining failed in call to
always_inline ‘nci_prop_rsp_packet’: function body not available
 inline int nci_prop_rsp_packet(struct nci_dev *ndev, __u16 opcode,

Signed-off-by: Robert Dolca <robert.dolca@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-28 06:44:45 +01:00
Robert Shearman cf4b24f002 mpls: reduce memory usage of routes
Nexthops for MPLS routes have a via address field sized for the
largest via address that is expected, which is 32 bytes. This means
that in the most common case of having ipv4 via addresses, 28 bytes of
memory more than required are used per nexthop. In the other common
case of an ipv6 nexthop then 16 bytes more than required are
used. With large numbers of MPLS routes this extra memory usage could
start to become significant.

To avoid allocating memory for a maximum length via address when not
all of it is required and to allow for ease of iterating over
nexthops, then the via addresses are changed to be stored in the same
memory block as the route and nexthops, but in an array after the end
of the array of nexthops. New accessors are provided to retrieve a
pointer to the via address.

To allow for O(1) access without having to store a pointer or offset
per nh, the via address for each nexthop is sized according to the
maximum via address for any nexthop in the route, which is stored in a
new route field, rt_max_alen, but this is in an existing hole in
struct mpls_route so it doesn't increase the size of the
structure. Each via address is ensured to be aligned to VIA_ALEN_ALIGN
to account for architectures that don't allow unaligned accesses.

Signed-off-by: Robert Shearman <rshearma@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27 19:52:59 -07:00
Robert Shearman b4e04fc735 mpls: fix forwarding using v4/v6 explicit null
Fill in the via address length for the predefined IPv4 and IPv6
explicit-null label routes.

Fixes: f8efb73c97 ("mpls: multipath route support")
Signed-off-by: Robert Shearman <rshearma@brocade.com>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27 19:52:58 -07:00
Sowmini Varadhan 8ce675ff39 RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv
Either of pskb_pull() or pskb_trim() may fail under low memory conditions.
If rds_tcp_data_recv() ignores such failures, the application will
receive corrupted data because the skb has not been correctly
carved to the RDS datagram size.

Avoid this by handling pskb_pull/pskb_trim failure in the same
manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and
retry via the deferred call to rds_send_worker() that gets set up on
ENOMEM from rds_tcp_read_sock()

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27 19:46:34 -07:00
Florian Westphal dbc3617f4c netfilter: nfnetlink: don't probe module if it exists
nfnetlink_bind request_module()s all the time as nfnetlink_get_subsys()
shifts the argument by 8 to obtain the subsys id.

So using type instead of type << 8 always returns NULL.

Fixes: 03292745b0 ("netlink: add nlk->netlink_bind hook for module auto-loading")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-28 03:40:50 +01:00
Hannes Frederic Sowa 080a270f5a sock: don't enable netstamp for af_unix sockets
netstamp_needed is toggled for all socket families if they request
timestamping. But some protocols don't need the lower-layer timestamping
code at all. This patch starts disabling it for af-unix.

E.g. systemd enables timestamping during boot-up on the journald af-unix
sockets, thus causing the system to globally enable timestamping in the
lower networking stack. Still, it is very probable that timestamping
gets activated, by e.g. dhclient or various NTP implementations.

Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27 19:39:14 -07:00
Joe Stringer 6f5cadee44 openvswitch: Fix skb leak using IPv6 defrag
nf_ct_frag6_gather() makes a clone of each skb passed to it, and if the
reassembly is successful, expects the caller to free all of the original
skbs using nf_ct_frag6_consume_orig(). This call was previously missing,
meaning that the original fragments were never freed (with the exception
of the last fragment to arrive).

Fix this by ensuring that all original fragments except for the last
fragment are freed via nf_ct_frag6_consume_orig(). The last fragment
will be morphed into the head, so it must not be freed yet. Furthermore,
retain the ->next pointer for the head after skb_morph().

Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27 19:32:18 -07:00
Joe Stringer 190b8ffbb7 ipv6: Export nf_ct_frag6_consume_orig()
This is needed in openvswitch to fix an skb leak in the next patch.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27 19:32:17 -07:00
Joe Stringer 74c1661813 openvswitch: Fix double-free on ip_defrag() errors
If ip_defrag() returns an error other than -EINPROGRESS, then the skb is
freed. When handle_fragments() passes this back up to
do_execute_actions(), it will be freed again. Prevent this double free
by never freeing the skb in do_execute_actions() for errors returned by
ovs_ct_execute. Always free it in ovs_ct_execute() error paths instead.

Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Reported-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27 19:32:14 -07:00
Alexander Duyck c2229fe143 fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key
We were computing the child index in cases where the key value we were
looking for was actually less than the base key of the tnode.  As a result
we were getting incorrect index values that would cause us to skip over
some children.

To fix this I have added a test that will force us to use child index 0 if
the key we are looking for is less than the key of the current tnode.

Fixes: 8be33e955c ("fib_trie: Fib walk rcu should take a tnode and key instead of a trie and a leaf")
Reported-by: Brian Rak <brak@gameservers.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27 18:14:51 -07:00
Alexander Aring 324e786ee3 bluetooth: 6lowpan: fix NOHZ: local_softirq_pending
Jukka reported about the following warning:

"NOHZ: local_softirq_pending 08"

I remember this warning and we had a similar issue when using workqueues
and calling netif_rx. See commit 5ff3fec ("mac802154: fix NOHZ
local_softirq_pending 08 warning").

This warning occurs when calling "netif_rx" inside the wrong context
(non softirq context). The net core api offers "netif_rx_ni" to call
netif_rx inside the correct softirq context.

Reported-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-27 09:53:36 +01:00
Munehisa Kamata 94f9cd8143 netfilter: nf_nat_redirect: add missing NULL pointer check
Commit 8b13eddfdf ("netfilter: refactor NAT
redirect IPv4 to use it from nf_tables") has introduced a trivial logic
change which can result in the following crash.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
IP: [<ffffffffa033002d>] nf_nat_redirect_ipv4+0x2d/0xa0 [nf_nat_redirect]
PGD 3ba662067 PUD 3ba661067 PMD 0
Oops: 0000 [#1] SMP
Modules linked in: ipv6(E) xt_REDIRECT(E) nf_nat_redirect(E) xt_tcpudp(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E) ip_tables(E) x_tables(E) binfmt_misc(E) xfs(E) libcrc32c(E) evbug(E) evdev(E) psmouse(E) i2c_piix4(E) i2c_core(E) acpi_cpufreq(E) button(E) ext4(E) crc16(E) jbd2(E) mbcache(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
CPU: 0 PID: 2536 Comm: ip Tainted: G            E   4.1.7-15.23.amzn1.x86_64 #1
Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/06/2015
task: ffff8800eb438000 ti: ffff8803ba664000 task.ti: ffff8803ba664000
[...]
Call Trace:
 <IRQ>
 [<ffffffffa0334065>] redirect_tg4+0x15/0x20 [xt_REDIRECT]
 [<ffffffffa02e2e99>] ipt_do_table+0x2b9/0x5e1 [ip_tables]
 [<ffffffffa0328045>] iptable_nat_do_chain+0x25/0x30 [iptable_nat]
 [<ffffffffa031777d>] nf_nat_ipv4_fn+0x13d/0x1f0 [nf_nat_ipv4]
 [<ffffffffa0328020>] ? iptable_nat_ipv4_fn+0x20/0x20 [iptable_nat]
 [<ffffffffa031785e>] nf_nat_ipv4_in+0x2e/0x90 [nf_nat_ipv4]
 [<ffffffffa03280a5>] iptable_nat_ipv4_in+0x15/0x20 [iptable_nat]
 [<ffffffff81449137>] nf_iterate+0x57/0x80
 [<ffffffff814491f7>] nf_hook_slow+0x97/0x100
 [<ffffffff814504d4>] ip_rcv+0x314/0x400

unsigned int
nf_nat_redirect_ipv4(struct sk_buff *skb,
...
{
...
		rcu_read_lock();
		indev = __in_dev_get_rcu(skb->dev);
		if (indev != NULL) {
			ifa = indev->ifa_list;
			newdst = ifa->ifa_local; <---
		}
		rcu_read_unlock();
...
}

Before the commit, 'ifa' had been always checked before access. After the
commit, however, it could be accessed even if it's NULL. Interestingly,
this was once fixed in 2003.

http://marc.info/?l=netfilter-devel&m=106668497403047&w=2

In addition to the original one, we have seen the crash when packets that
need to be redirected somehow arrive on an interface which hasn't been
yet fully configured.

This change just reverts the logic to the old behavior to avoid the crash.

Fixes: 8b13eddfdf ("netfilter: refactor NAT redirect IPv4 to use it from nf_tables")
Signed-off-by: Munehisa Kamata <kamatam@amazon.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2015-10-27 06:54:56 +01:00
emmanuel.grumbach@intel.com 8941faa161 net: tso: add support for IPv6
Adding IPv6 for the TSO helper API is trivial:
* Don't play with the id (which doesn't exist in IPv6)
* Correctly update the payload_len (don't include the
  length of the IP header itself)

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-26 22:24:22 -07:00
Eric Dumazet 7e3b6e7423 ipv6: gre: support SIT encapsulation
gre_gso_segment() chokes if SIT frames were aggregated by GRO engine.

Fixes: 61c1db7fae ("ipv6: sit: add GSO/TSO support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-26 22:01:18 -07:00
Kuba Pawlak 2c501cdd68 Bluetooth: Fix crash on fast disconnect of SCO
Fix a crash that may happen when a connection is closed before it was fully
established. Mapping conn->hcon was released by shutdown function, but it
is still referenced in (not yet finished) connection established handling
function.

[ 4635.254073] BUG: unable to handle kernel NULL pointer dereference at 00000013
[ 4635.262058] IP: [<c11659f0>] memcmp+0xe/0x25
[ 4635.266835] *pdpt = 0000000024190001 *pde = 0000000000000000
[ 4635.273261] Oops: 0000 [#1] PREEMPT SMP
[ 4635.277652] Modules linked in: evdev ecb vfat fat libcomposite usb2380 isofs zlib_inflate rfcomm(O) udc_core bnep(O) btusb(O) btbcm(O) btintel(O) bluetooth(O) cdc_acm arc4 uinput hid_mule
[ 4635.321761] Pid: 363, comm: kworker/u:2H Tainted: G           O 3.8.0-119.1-plk-adaptation-byt-ivi-brd #1
[ 4635.332642] EIP: 0060:[<c11659f0>] EFLAGS: 00010206 CPU: 0
[ 4635.338767] EIP is at memcmp+0xe/0x25
[ 4635.342852] EAX: e4720678 EBX: 00000000 ECX: 00000006 EDX: 00000013
[ 4635.349849] ESI: 00000000 EDI: fb85366c EBP: e40c7dc0 ESP: e40c7db4
[ 4635.356846]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 4635.362873] CR0: 8005003b CR2: 00000013 CR3: 24191000 CR4: 001007f0
[ 4635.369869] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 4635.376865] DR6: ffff0ff0 DR7: 00000400
[ 4635.381143] Process kworker/u:2H (pid: 363, ti=e40c6000 task=e40c5510 task.ti=e40c6000)
[ 4635.390080] Stack:
[ 4635.392319]  e4720400 00000000 fb85366c e40c7df4 fb842285 e40c7de2 fb853200 00000013
[ 4635.401003]  e3f101c4 e4720678 e3f101c0 e403be0a e40c7dfc e416a000 e403be0a fb85366c
[ 4635.409692]  e40c7e1c fb820186 020f6c00 e47c49ac e47c4008 00000000 e416a000 e47c402c
[ 4635.418380] Call Trace:
[ 4635.421153]  [<fb842285>] sco_connect_cfm+0xff/0x236 [bluetooth]
[ 4635.427893]  [<fb820186>] hci_sync_conn_complete_evt.clone.101+0x227/0x268 [bluetooth]
[ 4635.436758]  [<fb82370f>] hci_event_packet+0x1caa/0x21d3 [bluetooth]
[ 4635.443859]  [<c106231f>] ? trace_hardirqs_on+0xb/0xd
[ 4635.449502]  [<c1375b8a>] ? _raw_spin_unlock_irqrestore+0x42/0x59
[ 4635.456340]  [<fb814b67>] hci_rx_work+0xb9/0x350 [bluetooth]
[ 4635.462663]  [<c1039f1e>] ? process_one_work+0x17b/0x2e6
[ 4635.468596]  [<c1039f77>] process_one_work+0x1d4/0x2e6
[ 4635.474333]  [<c1039f1e>] ? process_one_work+0x17b/0x2e6
[ 4635.480294]  [<fb814aae>] ? hci_cmd_work+0xda/0xda [bluetooth]
[ 4635.486810]  [<c103a3fa>] worker_thread+0x171/0x20f
[ 4635.492257]  [<c10456c5>] ? complete+0x34/0x3e
[ 4635.497219]  [<c103ea06>] kthread+0x90/0x95
[ 4635.501888]  [<c103a289>] ? manage_workers+0x1df/0x1df
[ 4635.507628]  [<c1376537>] ret_from_kernel_thread+0x1b/0x28
[ 4635.513755]  [<c103e976>] ? __init_kthread_worker+0x42/0x42
[ 4635.519975] Code: 74 0d 3c 79 74 04 3c 59 75 0c c6 02 01 eb 03 c6 02 00 31 c0 eb 05 b8 ea ff ff ff 5d c3 55 89 e5 57 56 53 31 db eb 0e 0f b6 34 18 <0f> b6 3c 1a 43 29 fe 75 07 49 85 c9 7f
[ 4635.541264] EIP: [<c11659f0>] memcmp+0xe/0x25 SS:ESP 0068:e40c7db4
[ 4635.548166] CR2: 0000000000000013
[ 4635.552177] ---[ end trace e05ce9b8ce6182f6 ]---

Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-27 06:00:07 +01:00
Bjørn Mork 4b3418fba0 ipv6: icmp: include addresses in debug messages
Messages like "icmp6_send: no reply to icmp error" are close
to useless. Adding source and destination addresses to provide
some more clue.

Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-26 21:59:42 -07:00
Vincent Cuissard 2bd832459a NFC: NCI: allow spi driver to choose transfer clock
In some cases low level drivers might want to update the
SPI transfer clock (e.g. during firmware download).

This patch adds this support. Without any modification the
driver will use the default SPI clock (from pdata or device tree).

Signed-off-by: Vincent Cuissard <cuissard@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-27 04:23:34 +01:00
Vincent Cuissard fcd9d046fd NFC: NCI: move generic spi driver to a module
SPI driver should be a module.

Signed-off-by: Vincent Cuissard <cuissard@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-27 04:21:38 +01:00
Vincent Cuissard e5629d2947 NFC: NCI: export nci_send_frame and nci_send_cmd function
Export nci_send_frame and nci_send_cmd symbols to allow drivers
to use it. This is needed for example if NCI is used during
firmware download phase.

Signed-off-by: Vincent Cuissard <cuissard@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-27 04:16:14 +01:00
Christophe Ricard 15d17170b4 NFC: st21nfca: Add support for proprietary commands
Add support for proprietary commands useful mainly
for factory testings.

Here is a list:

- FACTORY_MODE: Allow to set the driver into a mode where no
  secure element are activated. It does not consider any
  NFC_ATTR_VENDOR_DATA.
- HCI_CLEAR_ALL_PIPES: Allow to execute a HCI clear all pipes
  command. It does not consider any NFC_ATTR_VENDOR_DATA.
- HCI_DM_PUT_DATA: Allow to configure specific CLF registry as
  for example RF trimmings or low level drivers configurations
  (I2C, SPI, SWP).
- HCI_DM_UPDATE_AID: Allow to configure an AID routing into the
  CLF routing table following RF technology, CLF mode or protocol.
- HCI_DM_GET_INFO: Allow to retrieve CLF information.
- HCI_DM_GET_DATA: Allow to retrieve CLF configurable data such as
  low level drivers configurations or RF trimmings.
- HCI_DM_LOAD: Allow to load a firmware into the CLF. A complete
  packet can be more than 8KB.
- HCI_DM_RESET: Allow to run a CLF reset in order to "commit" CLF
  configuration changes without CLF power off.
- HCI_GET_PARAM: Allow to retrieve an HCI CLF parameter (for example
  the white list).
- HCI_DM_FIELD_GENERATOR: Allow to generate different kind of RF
  technology. When using this command to anti-collision is done.
- HCI_LOOPBACK: Allow to echo a command and test the Dh to CLF
  connectivity.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-27 04:00:24 +01:00
Christophe Ricard 064d004796 NFC: st-nci: Add few code style fixes
Add some few code style fixes.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-27 03:55:13 +01:00
Christophe Ricard 96d4581f0b NFC: netlink: Add mode parameter to deactivate_target functions
In order to manage in a better way the nci poll mode state machine,
add mode parameter to deactivate_target functions.
This way we can manage different target state.
mode parameter make sense only in nci core.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-27 03:55:12 +01:00
Marcel Holtmann c4297e8f7f Bluetooth: Fix some obvious coding style issues in the SCO module
Lets fix this obvious coding style issues in the SCO module and bring it
in line with the rest of the Bluetooth subsystem.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-26 08:22:00 +02:00
Marcel Holtmann 05fcd4c4f1 Bluetooth: Replace hci_notify with hci_sock_dev_event
There is no point in wrapping hci_sock_dev_event around hci_notify. It
is an empty wrapper which adds no value. So remove it.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-26 08:21:47 +02:00
Marcel Holtmann 242c0ebd37 Bluetooth: Rename bt_cb()->req into bt_cb()->hci
The SKB context buffer for HCI request is really not just for requests,
information in their are preserved for the whole HCI layer. So it makes
more sense to actually rename it into bt_cb()->hci and also call it then
struct hci_ctrl.

In addition that allows moving the decoded opcode for outgoing packets
into that struct. So far it was just consuming valuable space from the
main shared items. And opcode are not valid for L2CAP packets.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-26 08:21:03 +02:00
Marcel Holtmann d94a61040d Bluetooth: Remove unneeded parenthesis around MSG_OOB
There are two checks that are still using (MSG_OOB) instead of just
MSG_OOB and so lets just fix them.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-26 08:20:51 +02:00
Christophe Ricard a1b0b94158 NFC: nci: Create pipe on specific gate in nci_hci_connect_gate
Some gates might need to have their pipes explicitly created.
Add a call to nci_hci_create_pipe in nci_hci_connect_gate for
every gate that is different than NCI_HCI_LINK_MGMT_GATE or
NCI_HCI_ADMIN_GATE.

In case of an error when opening a pipe, like in hci layer,
delete the pipe if it was created.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-26 06:53:13 +01:00
Christophe Ricard 8a49943f5b NFC: nci: Call nci_hci_clear_all_pipes at HCI initial activation.
When session_id is filled to 0xff, the pipe configuration is
probably incorrect and needs to be cleared.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-26 06:53:13 +01:00
Christophe Ricard fa6fbadea5 NFC: nci: add nci_hci_clear_all_pipes functions
nci_hci_clear_all_pipes might be use full in some cases
for example after a firmware update.

Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-26 06:53:12 +01:00
Christophe Ricard e65917b6d5 NFC: nci: extract pipe value using NCI_HCP_MSG_GET_PIPE
When receiving data in nci_hci_msg_rx_work, extract pipe
value using NCI_HCP_MSG_GET_PIPE macro.

Cc: stable@vger.kernel.org
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-26 06:53:12 +01:00
Christophe Ricard d8cd37ed2f NFC: nci: Fix improper management of HCI return code
When sending HCI data over NCI, HCI return code is part
of the NCI data. In order to get correctly the HCI return
code, we assume the NCI communication is successful and
extract the return code for the nci_hci functions return code.

This is done because nci_to_errno does not match hci return
code value.

Cc: stable@vger.kernel.org
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-26 06:53:12 +01:00
Christophe Ricard 500c4ef022 NFC: nci: Fix incorrect data chaining when sending data
When sending HCI data over NCI, cmd information should be
present only on the first packet.
Each packet shall be specifically allocated and sent to the
NCI layer.

Cc: stable@vger.kernel.org
Signed-off-by: Christophe Ricard <christophe-h.ricard@st.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-26 06:53:11 +01:00
Kuba Pawlak 1da5537ecc Bluetooth: Fix locking issue during fast SCO reconnection.
When SCO connection is requested and disconnected fast, there is a change
that sco_sock_shutdown is going to preempt thread started in sco_connect_cfm.
When this happens struct sock sk may be removed but a pointer to it is still
held in sco_conn_ready, where embedded spinlock is used. If it is used, but
struct sock has been removed, it will crash.

Block connection object, which will prevent struct sock from being removed
and give connection process chance to finish.

BUG: spinlock bad magic on CPU#0, kworker/u:2H/319
 lock: 0xe3e99434, .magic: f3000000, .owner: (���/0, .owner_cpu: -203804160
Pid: 319, comm: kworker/u:2H Tainted: G           O 3.8.0-115.1-plk-adaptation-byt-ivi-brd #1
Call Trace:
 [<c1155659>] ? do_raw_spin_lock+0x19/0xe9
 [<fb75354f>] ? sco_connect_cfm+0x92/0x236 [bluetooth]
 [<fb731dbc>] ? hci_sync_conn_complete_evt.clone.101+0x18b/0x1cb [bluetooth]
 [<fb734ee7>] ? hci_event_packet+0x1acd/0x21a6 [bluetooth]
 [<c1041095>] ? finish_task_switch+0x50/0x89
 [<c1349a2e>] ? __schedule+0x638/0x6b8
 [<fb727918>] ? hci_rx_work+0xb9/0x2b8 [bluetooth]
 [<c103760a>] ? queue_delayed_work_on+0x21/0x2a
 [<c1035df9>] ? process_one_work+0x157/0x21b
 [<fb72785f>] ? hci_cmd_work+0xef/0xef [bluetooth]
 [<c1036217>] ? worker_thread+0x16e/0x20a
 [<c10360a9>] ? manage_workers+0x1cf/0x1cf
 [<c103a0ef>] ? kthread+0x8d/0x92
 [<c134adf7>] ? ret_from_kernel_thread+0x1b/0x28
 [<c103a062>] ? __init_kthread_worker+0x24/0x24
BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<  (null)>]   (null)
*pdpt = 00000000244e1001 *pde = 0000000000000000
Oops: 0010 [#1] PREEMPT SMP
Modules linked in: evdev ecb rfcomm(O) libcomposite usb2380 udc_core bnep(O) btusb(O) btbcm(O) cdc_acm btintel(O) bluetooth(O) arc4 uinput hid_multitouch usbhid hid iwlmvm(O)e
Pid: 319, comm: kworker/u:2H Tainted: G           O 3.8.0-115.1-plk-adaptation-byt-ivi-brd #1
EIP: 0060:[<00000000>] EFLAGS: 00010246 CPU: 0
EIP is at 0x0
EAX: e3e99400 EBX: e3e99400 ECX: 00000100 EDX: 00000000
ESI: e3e99434 EDI: fb763ce0 EBP: e49b9e44 ESP: e49b9e14
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: 00000000 CR3: 24444000 CR4: 001007f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process kworker/u:2H (pid: 319, ti=e49b8000 task=e4ab9030 task.ti=e49b8000)
Stack:
 fb75355b 00000246 fb763900 22222222 22222222 22222222 e3f94460 e3ca7c0a
 e49b9e4c e3f34c00 e3ca7c0a fb763ce0 e49b9e6c fb731dbc 02000246 e4cec85c
 e4cec008 00000000 e3f34c00 e4cec000 e3c2ce00 0000002c e49b9ed0 fb734ee7
Call Trace:
 [<fb75355b>] ? sco_connect_cfm+0x9e/0x236 [bluetooth]
 [<fb731dbc>] ? hci_sync_conn_complete_evt.clone.101+0x18b/0x1cb [bluetooth]
 [<fb734ee7>] ? hci_event_packet+0x1acd/0x21a6 [bluetooth]
 [<c1041095>] ? finish_task_switch+0x50/0x89
 [<c1349a2e>] ? __schedule+0x638/0x6b8
 [<fb727918>] ? hci_rx_work+0xb9/0x2b8 [bluetooth]
 [<c103760a>] ? queue_delayed_work_on+0x21/0x2a
 [<c1035df9>] ? process_one_work+0x157/0x21b
 [<fb72785f>] ? hci_cmd_work+0xef/0xef [bluetooth]
 [<c1036217>] ? worker_thread+0x16e/0x20a
 [<c10360a9>] ? manage_workers+0x1cf/0x1cf
 [<c103a0ef>] ? kthread+0x8d/0x92
 [<c134adf7>] ? ret_from_kernel_thread+0x1b/0x28
 [<c103a062>] ? __init_kthread_worker+0x24/0x24
Code:  Bad EIP value.
EIP: [<00000000>] 0x0 SS:ESP 0068:e49b9e14
CR2: 0000000000000000
---[ end trace 942a6577c0abd725 ]---

Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-25 21:06:39 +01:00
Kuba Pawlak 435c513369 Bluetooth: Fix locking issue on SCO disconnection
Thread handling SCO disconnection may get preempted in '__sco_sock_close'
after dropping a reference to hci_conn but before marking this as NULL
in associated struct sco_conn. When execution returs to this thread,
this connection will possibly be released, resulting in kernel crash

Lock connection before this point.

BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [<fb770ab9>] __sco_sock_close+0x194/0x1ff [bluetooth]
*pdpt = 0000000023da6001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: evdev ecb rfcomm(O) libcomposite usb2380 udc_core bnep(O) btusb(O) btbcm(O) cdc_acm btintel(O) bluetooth(O) arc4 uinput hid_multitouch usbhid iwlmvm(O) hide
Pid: 984, comm: bluetooth Tainted: G           O 3.8.0-115.1-plk-adaptation-byt-ivi-brd #1
EIP: 0060:[<fb770ab9>] EFLAGS: 00010282 CPU: 2
EIP is at __sco_sock_close+0x194/0x1ff [bluetooth]
EAX: 00000000 EBX: e49d7600 ECX: ef1ec3c2 EDX: 000000c3
ESI: e4c12000 EDI: 00000000 EBP: ef1edf5c ESP: ef1edf4c
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 00000000 CR3: 23da7000 CR4: 001007f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process bluetooth (pid: 984, ti=ef1ec000 task=e47f2550 task.ti=ef1ec000)
Stack:
 e4c120d0 e49d7600 00000000 08421a40 ef1edf70 fb770b7a 00000002 e8a4cc80
 08421a40 ef1ec000 c12966b1 00000001 00000000 0000000b 084954c8 c1296b6c
 0000001b 00000002 0000001b 00000002 00000000 00000002 b2524880 00000046
Call Trace:
 [<fb770b7a>] ? sco_sock_shutdown+0x56/0x95 [bluetooth]
 [<c12966b1>] ? sys_shutdown+0x37/0x53
 [<c1296b6c>] ? sys_socketcall+0x12e/0x1be
 [<c134ae7e>] ? sysenter_do_call+0x12/0x26
 [<c1340000>] ? ip_vs_control_net_cleanup+0x46/0xb1
Code: e8 90 6b 8c c5 f6 05 72 5d 78 fb 04 74 17 8b 46 08 50 56 68 0a fd 77 fb 68 60 5d 78 fb e8 68 95 9e c5 83 c4 10 8b 83 fc 01 00 00 <c7> 00 00 00 00 00 eb 32 ba 68 00 00 0b
EIP: [<fb770ab9>] __sco_sock_close+0x194/0x1ff [bluetooth] SS:ESP 0068:ef1edf4c
CR2: 0000000000000000
---[ end trace 47fa2f55a9544e69 ]---

Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-25 21:06:39 +01:00
Kuba Pawlak 75e34f5cf6 Bluetooth: Fix crash on SCO disconnect
When disconnecting audio from the phone's side, it may happen, that
a thread handling HCI message 'disconnection complete' will get preempted
in 'sco_conn_del' before calling 'sco_sock_kill', still holding a pointer
to struct sock sk. Interrupting thread started in 'sco_sock_shutdown' will
carry on releasing resources and will eventually release struct sock.
When execution goes back to first thread it will call sco_sock_kill using
now invalid pointer to already destroyed socket.

Fix is to grab a reference to the socket a release it after calling
'sco_sock_kill'.

[  166.358213] BUG: unable to handle kernel paging request at 7541203a
[  166.365228] IP: [<fb6e8bfb>] bt_sock_unlink+0x1a/0x38 [bluetooth]
[  166.372068] *pdpt = 0000000024b19001 *pde = 0000000000000000
[  166.378483] Oops: 0002 [#1] PREEMPT SMP
[  166.382871] Modules linked in: evdev ecb rfcomm(O) libcomposite usb2380 udc_core bnep(O) btusb(O) btbcm(O) btintel(O) cdc_acm bluetooth(O) arc4 uinput hid_multitouch iwlmvm(O) usbhid hide
[  166.424233] Pid: 338, comm: kworker/u:2H Tainted: G           O 3.8.0-115.1-plk-adaptation-byt-ivi-brd #1
[  166.435112] EIP: 0060:[<fb6e8bfb>] EFLAGS: 00010206 CPU: 0
[  166.441259] EIP is at bt_sock_unlink+0x1a/0x38 [bluetooth]
[  166.447382] EAX: 632e6563 EBX: e4bfc600 ECX: e466d4d3 EDX: 7541203a
[  166.454369] ESI: fb7278ac EDI: e4d52000 EBP: e4669e20 ESP: e4669e0c
[  166.461366]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[  166.467391] CR0: 8005003b CR2: 7541203a CR3: 24aba000 CR4: 001007f0
[  166.474387] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  166.481375] DR6: ffff0ff0 DR7: 00000400
[  166.485654] Process kworker/u:2H (pid: 338, ti=e4668000 task=e466e030 task.ti=e4668000)
[  166.494591] Stack:
[  166.496830]  e4bfc600 e4bfc600 fb715c28 e4717ee0 e4d52000 e4669e3c fb715cf3 e4bfc634
[  166.505518]  00000068 e4d52000 e4c32000 fb7277c0 e4669e6c fb6f2019 0000004a 00000216
[  166.514205]  e4660101 e4c32008 02000001 00000013 e4d52000 e4c32000 e3dc9240 00000005
[  166.522891] Call Trace:
[  166.525654]  [<fb715c28>] ? sco_sock_kill+0x73/0x9a [bluetooth]
[  166.532295]  [<fb715cf3>] ? sco_conn_del+0xa4/0xbf [bluetooth]
[  166.538836]  [<fb6f2019>] ? hci_disconn_complete_evt.clone.55+0x1bd/0x205 [bluetooth]
[  166.547609]  [<fb6f73d3>] ? hci_event_packet+0x297/0x223c [bluetooth]
[  166.554805]  [<c10416da>] ? dequeue_task+0xaf/0xb7
[  166.560154]  [<c1041095>] ? finish_task_switch+0x50/0x89
[  166.566086]  [<c1349a2e>] ? __schedule+0x638/0x6b8
[  166.571460]  [<fb6eb906>] ? hci_rx_work+0xb9/0x2b8 [bluetooth]
[  166.577975]  [<c1035df9>] ? process_one_work+0x157/0x21b
[  166.583933]  [<fb6eb84d>] ? hci_cmd_work+0xef/0xef [bluetooth]
[  166.590448]  [<c1036217>] ? worker_thread+0x16e/0x20a
[  166.596088]  [<c10360a9>] ? manage_workers+0x1cf/0x1cf
[  166.601826]  [<c103a0ef>] ? kthread+0x8d/0x92
[  166.606691]  [<c134adf7>] ? ret_from_kernel_thread+0x1b/0x28
[  166.613010]  [<c103a062>] ? __init_kthread_worker+0x24/0x24
[  166.619230] Code: 85 63 ff ff ff 31 db 8d 65 f4 89 d8 5b 5e 5f 5d c3 56 8d 70 04 53 89 f0 89 d3 e8 7e 17 c6 c5 8b 53 28 85 d2 74 1a 8b 43 24 85 c0 <89> 02 74 03 89 50 04 c7 43 28 00 00 00
[  166.640501] EIP: [<fb6e8bfb>] bt_sock_unlink+0x1a/0x38 [bluetooth] SS:ESP 0068:e4669e0c
[  166.649474] CR2: 000000007541203a
[  166.653420] ---[ end trace 0181ff2c9e42d51e ]---
[  166.658609] note: kworker/u:2H[338] exited with preempt_count 1

Signed-off-by: Kuba Pawlak <kubax.t.pawlak@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-25 21:06:39 +01:00
Robert Dolca 85b9ce9a21 NFC: nci: add nci_get_conn_info_by_id function
This functin takes as a parameter a pointer to the nci_dev
struct and the first byte from the values of the first domain
specific parameter that was used for the connection creation.

Signed-off-by: Robert Dolca <robert.dolca@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-25 20:29:11 +01:00
Robert Dolca caa575a86e NFC: nci: fix possible crash in nci_core_conn_create
If the number of destination speific parameters supplied is 0
the call will fail. If the first destination specific parameter
does not have a value, curr_id will be set to 0.

Signed-off-by: Robert Dolca <robert.dolca@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-25 20:29:05 +01:00
Robert Dolca 22e4bd09c4 NFC: nci: rename nci_prop_ops to nci_driver_ops
Initially it was used to create hooks in the driver for
proprietary operations. Currently it is being used for hooks
for both proprietary and generic operations.

Signed-off-by: Robert Dolca <robert.dolca@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-25 20:28:59 +01:00
Robert Dolca 0a97a3cba2 NFC: nci: Allow the driver to set handler for core nci ops
The driver may be required to act when some responses or
notifications arrive. For example the NCI core does not have a
handler for NCI_OP_CORE_GET_CONFIG_RSP. The NFCC can send a
config response that has to be read by the driver and the packet
may contain vendor specific data.

The Fields Peak driver needs to take certain actions when a reset
notification arrives (packet also not handled by the nfc core).

The driver handlers do not interfere with the core and they are
called after the core processes the packet.

Signed-off-by: Robert Dolca <robert.dolca@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2015-10-25 19:12:57 +01:00