Commit Graph

64419 Commits

Author SHA1 Message Date
Pablo Neira Ayuso 0768b3b3d2 netfilter: nf_tables: add optional user data area to rules
This allows us to store user comment strings, but it could be also
used to store any kind of information that the user application needs
to link to the rule.

Scratch 8 bits for the new ulen field that indicates the length the
user data area. 4 bits from the handle (so it's 42 bits long, according
to Patrick, it would last 139 years with 1000 new rules per second)
and 4 bits from dlen (so the expression data area is 4K, which seems
sufficient by now even considering the compatibility layer).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
2014-02-27 16:56:00 +01:00
Patrick McHardy 67a8fc27cc netfilter: nf_tables: add nft_dereference() macro
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-25 11:29:23 +01:00
Patrick McHardy 0eb5db7ad3 netfilter: nfnetlink: add rcu_dereference_protected() helpers
Add a lockdep_nfnl_is_held() function and a nfnl_dereference() macro for
RCU dereferences protected by a NFNL subsystem mutex.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-25 11:29:21 +01:00
Florian Westphal d2bf2f34cc netfilter: nft_ct: labels get support
This also adds NF_CT_LABELS_MAX_SIZE so it can be re-used
as BUILD_BUG_ON in nft_ct.

At this time, nft doesn't yet support writing to the label area;
when this changes the label->words handling needs to be moved
out of xt_connlabel.c into nf_conntrack_labels.c.

Also removes a useless run-time check: words cannot grow beyond
4 (32 bit) or 2 (64bit) since xt_connlabel enforces a maximum of
128 labels.

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-02-19 11:41:25 +01:00
David S. Miller 1e8d6421cf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_3ad.h
	drivers/net/bonding/bond_main.c

Two minor conflicts in bonding, both of which were overlapping
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-19 01:24:22 -05:00
Linus Torvalds 960dfc4eb2 Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Lots of little small things, nothing too major: nouveau regression
  fixes, vmware fixes for the new hw support, memory leaks in error path
  fixes"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (31 commits)
  drm/radeon/ni: fix typo in dpm sq ramping setup
  drm/radeon/si: fix typo in dpm sq ramping setup
  drm/radeon: fix CP semaphores on CIK
  drm/radeon: delete a stray tab
  drm/radeon: fix display tiling setup on SI
  drm/radeon/dpm: reduce r7xx vblank mclk threshold to 200
  drm/radeon: fill in DRM_CAPs for cursor size
  drm: add DRM_CAPs for cursor size
  drm/radeon: unify bpc handling
  drm/ttm: Fix memory leak in ttm_agp_backend.c
  drm/ttm: declare 'struct device' in ttm_page_alloc.h
  drm/nouveau: fix TTM_PL_TT memtype on pre-nv50
  drm/nv50/disp: use correct register to determine DP display bpp
  drm/nouveau/fb: use correct ram oclass for nv1a hardware
  drm/nv50/gr: add missing nv_error parameter priv
  drm/nouveau: fix ENG_RUNLIST register address
  drm/nv4c/bios: disallow retrieving from prom on nv4x igp's
  drm/nv4c/vga: decode register is in a different place on nv4x igp's
  drm/nv4c/mc: nv4x igp's have a different msi rearm register
  drm/nouveau: set irq_enabled manually
  ...
2014-02-18 16:36:07 -08:00
Linus Torvalds b0d3f6d47e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) kvaser CAN driver has fixed limits of some of it's table, validate
    that we won't exceed those limits at probe time.  Fix from Olivier
    Sobrie.

 2) Fix rtl8192ce disabling interrupts for too long, from Olivier
    Langlois.

 3) Fix botched shift in ath5k driver, from Dan Carpenter.

 4) Fix corruption of deferred packets in TIPC, from Erik Hugne.

 5) Fix newlink error path in macvlan driver, from Cong Wang.

 6) Fix netpoll deadlock in bonding, from Ding Tianhong.

 7) Handle GSO packets properly in forwarding path when fragmentation is
    necessary on egress, from Florian Westphal.

 8) Fix axienet build errors, from Michal Simek.

 9) Fix refcounting of ubufs on tx in vhost net driver, from Michael S
    Tsirkin.

10) Carrier status isn't set properly in hyperv driver, from Haiyang
    Zhang.

11) Missing pci_disable_device() in tulip_remove_one), from Ingo Molnar.

12) AF_PACKET qdisc bypass mode doesn't adhere to driver provided TX
    queue selection method.  Add a fallback method mechanism to fix this
    bug, from Daniel Borkmann.

13) Fix regression in link local route handling on GRE tunnels, from
    Nicolas Dichtel.

14) Bonding can assign dup aggregator IDs in some sequences of
    configuration, fix by making the allocation counter per-bond instead
    of global.  From Jiri Bohac.

15) sctp_connectx() needs compat translations, from Daniel Borkmann.

16) Fix of_mdio PHY interrupt parsing, from Ben Dooks

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits)
  MAINTAINERS: add entry for the PHY library
  of_mdio: fix phy interrupt passing
  net: ethernet: update dependency and help text of mvneta
  NET: fec: only enable napi if we are successful
  af_packet: remove a stray tab in packet_set_ring()
  net: sctp: fix sctp_connectx abi for ia32 emulation/compat mode
  ipv4: fix counter in_slow_tot
  irtty-sir.c: Do not set_termios() on irtty_close()
  bonding: 802.3ad: make aggregator_identifier bond-private
  usbnet: remove generic hard_header_len check
  gre: add link local route when local addr is any
  batman-adv: fix potential kernel paging error for unicast transmissions
  batman-adv: avoid double free when orig_node initialization fails
  batman-adv: free skb on TVLV parsing success
  batman-adv: fix TT CRC computation by ensuring byte order
  batman-adv: fix potential orig_node reference leak
  batman-adv: avoid potential race condition when adding a new neighbour
  batman-adv: properly check pskb_may_pull return value
  batman-adv: release vlan object after checking the CRC
  batman-adv: fix TT-TVLV parsing on OGM reception
  ...
2014-02-18 15:52:43 -08:00
Jiri Pirko f7b12606b5 rtnl: make ifla_policy static
The only place this is used outside rtnetlink.c is veth. So provide
wrapper function for this usage.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-18 18:15:42 -05:00
Dave Airlie 75936c65dd Merge tag 'ttm-fixes-3.14-2014-02-18' of git://people.freedesktop.org/~thomash/linux into drm-fixes
Pull request of 2014-02-18

One compile fix and one memory leak.

* tag 'ttm-fixes-3.14-2014-02-18' of git://people.freedesktop.org/~thomash/linux:
  drm/ttm: Fix memory leak in ttm_agp_backend.c
  drm/ttm: declare 'struct device' in ttm_page_alloc.h
2014-02-19 08:21:26 +10:00
Dave Airlie 9830e44f56 Merge tag 'vmwgfx-fixes-3.14-2014-02-18' of git://people.freedesktop.org/~thomash/linux into drm-fixes
Pull request of 2014-02-18.

Nothing special. The biggest change is adding a couple of command defines and
packing the command data correctly.

* tag 'vmwgfx-fixes-3.14-2014-02-18' of git://people.freedesktop.org/~thomash/linux:
  drm/vmwgfx: Fix command defines and checks
  drm/vmwgfx: Fix possible integer overflow
  drm/vmwgfx: Remove stray const
  drm/vmwgfx: unlock on error path in vmw_execbuf_process()
  drm/vmwgfx: Get maximum mob size from register SVGA_REG_MOB_MAX_SIZE
  drm/vmwgfx: Fix a couple of sparse warnings and errors
2014-02-19 08:21:02 +10:00
Veaceslav Falico f8ff080dac bonding: remove useless updating of slave->dev->last_rx
Now that all the logic is handled via last_arp_rx, we don't need to use
last_rx.

CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-18 16:47:15 -05:00
Alex Deucher 8716ed4e7b drm: add DRM_CAPs for cursor size
Some hardware may not support standard 64x64 cursors.  Add
a drm cap to query the cursor size from the kernel.  Some examples
include radeon CIK parts (128x128 cursors) and armada (32x64 or 64x32).
This allows things like device specific ddxes to remove asics specific
logic and also allows xf86-video-modesetting to work properly with hw
cursors on this hardware. Default to 64 if the driver doesn't specify
a size.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2014-02-18 13:41:01 -05:00
Alexandre Courbot 728a0cdf06 drm/ttm: declare 'struct device' in ttm_page_alloc.h
Declare 'struct device' explicitly in ttm_page_alloc.h as this file
does not include any file declaring it. This removes the following
warning:

	warning: 'struct device' declared inside parameter list

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Reviewed-by: Thierry Reding <treding@nvidia.com>
2014-02-18 14:01:48 +01:00
Linus Torvalds 87eeff7974 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
 "We have some patches fixing up ACL support issues from Zheng and
  Guangliang and a mount option to enable/disable this support.  (These
  fixes were somewhat delayed by the Chinese holiday.)

  There is also a small fix for cached readdir handling when directories
  are fragmented"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: fix __dcache_readdir()
  ceph: add acl, noacl options for cephfs mount
  ceph: make ceph_forget_all_cached_acls() static inline
  ceph: add missing init_acl() for mkdir() and atomic_open()
  ceph: fix ceph_set_acl()
  ceph: fix ceph_removexattr()
  ceph: remove xattr when null value is given to setxattr()
  ceph: properly handle XATTR_CREATE and XATTR_REPLACE
2014-02-17 13:51:00 -08:00
Phoebe Buckheister 4244db1b0b ieee802154: add netlink APIs for smartMAC configuration
Introduce new netlink attributes for SET_PHY_ATTRS:
 * CSMA minimal backoff exponent
 * CSMA maximal backoff exponent
 * CSMA retry limit
 * frame retransmission limit

The CSMA attributes shall correspond to minBE, maxBE and maxCSMABackoffs of
802.15.4, respectively. The frame retransmission shall correspond to
maxFrameRetries of 802.15.4, unless given as -1: then the old behaviour
of the stack shall apply. For RF2xy, the old behaviour is to not do
channel sensing at all and simply send *right now*, which is not
intended behaviour for most applications and actually prohibited for
some channel/page combinations.

For all values except frame retransmission limit, the defaults of
802.15.4 apply. Frame retransmission limits are set to -1 to indicate
backward-compatible behaviour.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:42:39 -05:00
Phoebe Buckheister 6ca001978d ieee802154: add support for setting CCA energy detection levels
Since three of the four clear channel assesment modes make use of energy
detection, provide an API to set the energy detection threshold.
Driver support for this is available in at86rf230 for the RF212 chips.
Since for these chips the minimal energy detection threshold depends on
page and channel used, add a field to struct at86rf230_local that stores
the minimal threshold. Actual ED thresholds are configured as offsets
from this value.

For RF212, setting the ED threshold will not work before a channel/page
has been set due to the dependency of energy detection in the chip and
the actual channel/page selected.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:42:38 -05:00
Phoebe Buckheister ba08fea53a ieee802154: add support for CCA mode in wpan phys
The standard describes four modes of clear channel assesment: "energy
above threshold", "carrier found", and the logical and/or of these two.
Support for CCA mode setting is included in the at86rf230 driver,
predicated for RF212 chips.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:42:38 -05:00
Phoebe Buckheister 84dda3c648 ieee802154: add support for listen-before-talk in wpan_phy
Listen-before-talk is an alternative to CSMA in uncoordinated networks
and prescribed by european regulations if one wants to have a device
with radio duty cycles above 10% (or less in some bands). Add a phy
property to enable/disable LBT in the phy, including support in the
at86rf230 driver for RF212 chips.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:42:38 -05:00
Phoebe Buckheister 9b2777d608 ieee802154: add TX power control to wpan_phy
Replace the current u8 transmit_power in wpan_phy with s8 transmit_power.
The u8 field contained the actual tx power and a tolerance field,
which no physical radio every used. Adjust sysfs entries to keep
compatibility with userspace, give tolerances of +-1dB statically there.

This patch only adds support for this in the at86rf230 driver and the
RF212 chip. Configuration calculation for RF212 is also somewhat basic,
but does the job - the RF212 datasheet gives a large table with
suggested values for combinations of TX power and page/channel, if this
does not work well, we might have to copy the whole table.

Signed-off-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:42:38 -05:00
Florian Fainelli 9df81dd758 net: phy: allow PHY drivers to implement their own software reset
As pointed out by Shaohui, most 10G PHYs out there have a non-standard
compliant software reset sequence, eventually something much more
complex than just toggling the BMCR_RESET bit. Allow PHY driver to
implement their own soft_reset() callback to deal with that. If no
callback is provided, call into genphy_soft_reset() which makes sure the
existing behavior is kept intact.

Reported-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:40:09 -05:00
Florian Fainelli 797ac07137 net: phy: move PHY software reset to genphy_soft_reset
As pointed out by Shaohui, this function is generic for 10/100/1000
PHYs, but 10G PHYs might have a slightly different reset sequence which
prevents most of them from using this function.

Move the BMCR_RESET based software resent sequence to
genphy_soft_reset() in preparation for allowing PHY drivers to implement
a soft_reset() callback.

Reported-by: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 16:40:08 -05:00
Linus Torvalds 60f76eab19 Small dma-buf pull request for 3.14
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJTAgaqAAoJEAG+/NWsLn5b+vAP/1te3UC4H6QvjLTfp+CLLPyM
 pSo0jDC/JP4MdkPajiI8ALGkCtO0TWoL+aoG+i+icUKoPx9Pjo+c8nmfgVzcpWeg
 FeVyq8zWL4akLQJIgypNOpX27sJ0jPHaCyct70+c8LdGHeJErIczrKWznh8MLHj4
 57Y2jm/3ge98Ib9rNy3+Bo2IrnyKjggKOq9/y0W0pYkACI+MuOioSsziw9fq0rKW
 yWe6XJg1L22Qd0tkm+tBfUQZxzJNX+rpGDVBqFib02jVOrcH6uGRamoxdPQSck51
 SUroKBYhLWYVYcTP/zu7JRMWk9Zcqwjv9CdzSi/GEA3775Py26SNoBcncTzKUj8L
 XsWFafFuA6RIOd7yArfq24vsv9x0YycK+uqlZi+AvjyZFxUqB2s9k5Y/u5LA8qeY
 XZj78zkPBGt/a6IjjwfBTo5xIQ15VoO5hvhUbKxuKvXZhMZKmtuE0oDzjPNe3Gc+
 C3CgJISAzCFZuw+THbvztCxCp5ydNkCn/SNCU3gNnH/nC0Lw2iK2+EASXJzb7Cc/
 10gazyXX9N3Ac8NXuBpHXbQaAPgll8/ArG2CvMkDqvjIMf04Gz5WWnsQ/RQjaI0y
 gRA4EcsMNTsQ5IH9f6kM2LAPbPeeyNbd13LGYL/yCZy/dn3C9rkPREno490e2XYA
 8ST+uKjzaIcFB+FHbHSS
 =TtdN
 -----END PGP SIGNATURE-----

Merge tag 'dma-buf-for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf

Pull dma-buf fix from Sumit Semwal:
 "Just some debugfs output updates.

  There's another patch related to dma-buf, but it'll get upstreamed via
  Greg KH's pull request"

* tag 'dma-buf-for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf:
  dma-buf: update debugfs output
2014-02-17 12:42:45 -08:00
Yan, Zheng bcdfeb2eb4 ceph: remove xattr when null value is given to setxattr()
For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE
to distinguish null value case from the zero-length value case.

Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
2014-02-17 12:37:09 -08:00
Linus Torvalds f2a77abdb8 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
 "Here are some more powerpc fixes for 3.14

  The main one is a nasty issue with the NUMA balancing support which
  requires a small generic change and the addition of a new accessor to
  set _PAGE_NUMA.  Both have been reviewed and acked by Mel and Rik.

  The changelog should have plenty of details but basically, without
  this fix, we get random user segfaults and/or corruptions due to
  missing TLB/hash flushes.  Aneesh series of 3 patches fixes it.

  We have some vDSO vs.  perf fixes from Anton, some small EEH fixes
  from Gavin, a ppc32 regression vs the stack overflow detector, and a
  fix for the way we handle PCIe host bridge speed settings on pseries
  (which is needed for proper operations of AMD graphics cards on
  Power8)"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/eeh: Disable EEH on reboot
  powerpc/eeh: Cleanup on eeh_subsystem_enabled
  powerpc/powernv: Rework EEH reset
  powerpc: Use unstripped VDSO image for more accurate profiling data
  powerpc: Link VDSOs at 0x0
  mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit
  mm: Dirty accountable change only apply to non prot numa case
  powerpc/mm: Add new "set" flag argument to pte/pmd update function
  powerpc/pseries: Add Gen3 definitions for PCIE link speed
  powerpc/pseries: Fix regression on PCI link speed
  powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack
2014-02-17 12:36:49 -08:00
Daniel Borkmann b9507bdaf4 netdevice: move netdev_cap_txqueue for shared usage to header
In order to allow users to invoke netdev_cap_txqueue, it needs to
be moved into netdevice.h header file. While at it, also add kernel
doc header to document the API.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 00:36:34 -05:00
Daniel Borkmann 99932d4fc0 netdevice: add queue selection fallback handler for ndo_select_queue
Add a new argument for ndo_select_queue() callback that passes a
fallback handler. This gets invoked through netdev_pick_tx();
fallback handler is currently __netdev_pick_tx() as most drivers
invoke this function within their customized implementation in
case for skbs that don't need any special handling. This fallback
handler can then be replaced on other call-sites with different
queue selection methods (e.g. in packet sockets, pktgen etc).

This also has the nice side-effect that __netdev_pick_tx() is
then only invoked from netdev_pick_tx() and export of that
function to modules can be undone.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 00:36:34 -05:00
Matija Glavinic Pecotic ef2820a735 net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer
Implementation of (a)rwnd calculation might lead to severe performance issues
and associations completely stalling. These problems are described and solution
is proposed which improves lksctp's robustness in congestion state.

1) Sudden drop of a_rwnd and incomplete window recovery afterwards

Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data),
but size of sk_buff, which is blamed against receiver buffer, is not accounted
in rwnd. Theoretically, this should not be the problem as actual size of buffer
is double the amount requested on the socket (SO_RECVBUF). Problem here is
that this will have bad scaling for data which is less then sizeof sk_buff.
E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion
of traffic of this size (less then 100B).

An example of sudden drop and incomplete window recovery is given below. Node B
exhibits problematic behavior. Node A initiates association and B is configured
to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp
message in 4G (LTE) network). On B data is left in buffer by not reading socket
in userspace.

Lets examine when we will hit pressure state and declare rwnd to be 0 for
scenario with above stated parameters (rwnd == 10000, chunk size == 43, each
chunk is sent in separate sctp packet)

Logic is implemented in sctp_assoc_rwnd_decrease:

socket_buffer (see below) is maximum size which can be held in socket buffer
(sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count)

A simple expression is given for which it will be examined after how many
packets for above stated parameters we enter pressure state:

We start by condition which has to be met in order to enter pressure state:

	socket_buffer < currently_alloced;

currently_alloced is represented as size of sctp packets received so far and not
yet delivered to userspace. x is the number of chunks/packets (since there is no
bundling, and each chunk is delivered in separate packet, we can observe each
chunk also as sctp packet, and what is important here, having its own sk_buff):

	socket_buffer < x*each_sctp_packet;

each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is
twice the amount of initially requested size of socket buffer, which is in case
of sctp, twice the a_rwnd requested:

	2*rwnd < x*(payload+sizeof(struc sk_buff));

sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000
and each payload size is 43

	20000 < x(43+190);

	x > 20000/233;

	x ~> 84;

After ~84 messages, pressure state is entered and 0 rwnd is advertised while
received 84*43B ~= 3612B sctp data. This is why external observer notices sudden
drop from 6474 to 0, as it will be now shown in example:

IP A.34340 > B.12345: sctp (1) [INIT] [init tag: 1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN: 1096057017]
IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag: 3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN: 902132839]
IP A.34340 > B.12345: sctp (1) [COOKIE ECHO]
IP B.12345 > A.34340: sctp (1) [COOKIE ACK]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057017] [SID: 0] [SSEQ 0] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057018] [SID: 0] [SSEQ 1] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057019] [SID: 0] [SSEQ 2] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0]
<...>
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057098] [SID: 0] [SSEQ 81] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057099] [SID: 0] [SSEQ 82] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057100] [SID: 0] [SSEQ 83] [PPID 0x18]

--> Sudden drop

IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

At this point, rwnd_press stores current rwnd value so it can be later restored
in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start
slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This
condition is not met since rwnd, after it hit 0, must first reach rwnd_press by
adding amount which is read from userspace. Let us observe values in above
example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the
amount of actual sctp data currently waiting to be delivered to userspace
is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed
only for sctp data, which is ~3500. Condition is never met, and when userspace
reads all data, rwnd stays on 3569.

IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0]
IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057101] [SID: 0] [SSEQ 84] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]

--> At this point userspace read everything, rwnd recovered only to 3569

IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057102] [SID: 0] [SSEQ 85] [PPID 0x18]
IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0]

Reproduction is straight forward, it is enough for sender to send packets of
size less then sizeof(struct sk_buff) and receiver keeping them in its buffers.

2) Minute size window for associations sharing the same socket buffer

In case multiple associations share the same socket, and same socket buffer
(sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one
of the associations can permanently drop rwnd of other association(s).

Situation will be typically observed as one association suddenly having rwnd
dropped to size of last packet received and never recovering beyond that point.
Different scenarios will lead to it, but all have in common that one of the
associations (let it be association from 1)) nearly depleted socket buffer, and
the other association blames socket buffer just for the amount enough to start
the pressure. This association will enter pressure state, set rwnd_press and
announce 0 rwnd.
When data is read by userspace, similar situation as in 1) will occur, rwnd will
increase just for the size read by userspace but rwnd_press will be high enough
so that association doesn't have enough credit to reach rwnd_press and restore
to previous state. This case is special case of 1), being worse as there is, in
the worst case, only one packet in buffer for which size rwnd will be increased.
Consequence is association which has very low maximum rwnd ('minute size', in
our case down to 43B - size of packet which caused pressure) and as such
unusable.

Scenario happened in the field and labs frequently after congestion state (link
breaks, different probabilities of packet drop, packet reordering) and with
scenario 1) preceding. Here is given a deterministic scenario for reproduction:

>From node A establish two associations on the same socket, with rcvbuf_policy
being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1
repeat scenario from 1), that is, bring it down to 0 and restore up. Observe
scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered',
bring it down close to 0, as in just one more packet would close it. This has as
a consequence that association number 2 is able to receive (at least) one more
packet which will bring it in pressure state. E.g. if association 2 had rwnd of
10000, packet received was 43, and we enter at this point into pressure,
rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will
increase for 43, but conditions to restore rwnd to original state, just as in
1), will never be satisfied.

--> Association 1, between A.y and B.12345

IP A.55915 > B.12345: sctp (1) [INIT] [init tag: 836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 4032536569]
IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag: 2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3799315613]
IP A.55915 > B.12345: sctp (1) [COOKIE ECHO]
IP B.12345 > A.55915: sctp (1) [COOKIE ACK]

--> Association 2, between A.z and B.12346

IP A.55915 > B.12346: sctp (1) [INIT] [init tag: 534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 2099285173]
IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag: 516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3676403240]
IP A.55915 > B.12346: sctp (1) [COOKIE ECHO]
IP B.12346 > A.55915: sctp (1) [COOKIE ACK]

--> Deplete socket buffer by sending messages of size 43B over association 1

IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315613] [SID: 0] [SSEQ 0] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0]

<...>

IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315697] [SID: 0] [SSEQ 84] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0]

--> Sudden drop on 1

IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315698] [SID: 0] [SSEQ 85] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

--> Here userspace read, rwnd 'recovered' to 3698, now deplete again using
    association 1 so there is place in buffer for only one more packet

IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315799] [SID: 0] [SSEQ 186] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315800] [SID: 0] [SSEQ 187] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]

--> Socket buffer is almost depleted, but there is space for one more packet,
    send them over association 2, size 43B

IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403240] [SID: 0] [SSEQ 0] [PPID 0x18]
IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

--> Immediate drop

IP A.60995 > B.12346: sctp (1) [SACK] [cum ack 387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0]

--> Read everything from the socket, both association recover up to maximum rwnd
    they are capable of reaching, note that association 1 recovered up to 3698,
    and association 2 recovered only to 43

IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0]
IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315801] [SID: 0] [SSEQ 188] [PPID 0x18]
IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0]
IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403241] [SID: 0] [SSEQ 1] [PPID 0x18]
IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0]

A careful reader might wonder why it is necessary to reproduce 1) prior
reproduction of 2). It is simply easier to observe when to send packet over
association 2 which will push association into the pressure state.

Proposed solution:

Both problems share the same root cause, and that is improper scaling of socket
buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while
calculating rwnd is not possible due to fact that there is no linear
relationship between amount of data blamed in increase/decrease with IP packet
in which payload arrived. Even in case such solution would be followed,
complexity of the code would increase. Due to nature of current rwnd handling,
slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is
entered is rationale, but it gives false representation to the sender of current
buffer space. Furthermore, it implements additional congestion control mechanism
which is defined on implementation, and not on standard basis.

Proposed solution simplifies whole algorithm having on mind definition from rfc:

o  Receiver Window (rwnd): This gives the sender an indication of the space
   available in the receiver's inbound buffer.

Core of the proposed solution is given with these lines:

sctp_assoc_rwnd_update:
	if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0)
		asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1;
	else
		asoc->rwnd = 0;

We advertise to sender (half of) actual space we have. Half is in the braces
depending whether you would like to observe size of socket buffer as SO_RECVBUF
or twice the amount, i.e. size is the one visible from userspace, that is,
from kernelspace.
In this way sender is given with good approximation of our buffer space,
regardless of the buffer policy - we always advertise what we have. Proposed
solution fixes described problems and removes necessity for rwnd restoration
algorithm. Finally, as proposed solution is simplification, some lines of code,
along with some bytes in struct sctp_association are saved.

Version 2 of the patch addressed comments from Vlad. Name of the function is set
to be more descriptive, and two parts of code are changed, in one removing the
superfluous call to sctp_assoc_rwnd_update since call would not result in update
of rwnd, and the other being reordering of the code in a way that call to
sctp_assoc_rwnd_update updates rwnd. Version 3 corrected change introduced in v2
in a way that existing function is not reordered/copied in line, but it is
correctly called. Thanks Vlad for suggesting.

Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-17 00:16:56 -05:00
Aneesh Kumar K.V 56eecdb912 mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit
Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using
a hash table MMU for various reasons (the flush is handled as part of
the PTE modification when necessary).

ppc64 thus doesn't implement flush_tlb_range for hash based MMUs.

Additionally ppc64 require the tlb flushing to be batched within ptl locks.

The reason to do that is to ensure that the hash page table is in sync with
linux page table.

We track the hpte index in linux pte and if we clear them without flushing
hash and drop the ptl lock, we can have another cpu update the pte and can
end up with duplicate entry in the hash table, which is fatal.

We also want to keep set_pte_at simpler by not requiring them to do hash
flush for performance reason. We do that by assuming that set_pte_at() is
never *ever* called on a PTE that is already valid.

This was the case until the NUMA code went in which broke that assumption.

Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a
way similar to ptep/pmdp_set_wrprotect(), with a generic implementation
using set_pte_at() and a powerpc specific one using the appropriate
mechanism needed to keep the hash table in sync.

Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-02-17 11:19:36 +11:00
Linus Torvalds 3962dfbe22 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "We have a small collection of fixes in my for-linus branch.

  The big thing that stands out is a revert of a new ioctl.  Users
  haven't shipped yet in btrfs-progs, and Dave Sterba found a better way
  to export the information"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: use right clone root offset for compressed extents
  btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105
  Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol
  Btrfs: fix max_inline mount option
  Btrfs: fix a lockdep warning when cleaning up aborted transaction
  Revert "btrfs: add ioctl to export size of global metadata reservation"
2014-02-16 11:05:27 -08:00
Linus Torvalds 946dd683af Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Mostly minor fixes this time to v3.14-rc1 related changes.  Also
  included is one fix for a free after use regression in persistent
  reservations UNREGISTER logic that is CC'ed to >= v3.11.y stable"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  Target/sbc: Fix protection copy routine
  IB/srpt: replace strict_strtoul() with kstrtoul()
  target: Simplify command completion by removing CMD_T_FAILED flag
  iser-target: Fix leak on failure in isert_conn_create_fastreg_pool
  iscsi-target: Fix SNACK Type 1 + BegRun=0 handling
  target: Fix missing length check in spc_emulate_evpd_83()
  qla2xxx: Remove last vestiges of qla_tgt_cmd.cmd_list
  target: Fix 32-bit + CONFIG_LBDAF=n link error w/ sector_div
  target: Fix free-after-use regression in PR unregister
2014-02-15 16:18:47 -08:00
Linus Torvalds 5a667a0c02 Merge branches 'irq-urgent-for-linus' and 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq update from Thomas Gleixner:
 "Fix from the urgent branch: a trivial oneliner adding the missing
  Kconfig dependency curing build failures which have been discovered by
  several build robots.

  The update in the irq-core branch provides a new function in the
  irq/devres code, which is a prerequisite for driver developers to get
  rid of boilerplate code all over the place.

  Not a bugfix, but it has zero impact on the current kernel due to the
  lack of users.  It's simpler to provide the infrastructure to
  interested parties via your tree than fulfilling the wishlist of
  driver maintainers on which particular commit or tag this should be
  based on"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Add missing irq_to_desc export for CONFIG_SPARSE_IRQ=n

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Add devm_request_any_context_irq()
2014-02-15 16:06:12 -08:00
Linus Torvalds 83660b734b A collection of ARM SoC fixes for v3.14-rc1.
Mostly a collection of Kconfig, device tree data and compilation fixes
 along with fix to drivers/phy that fixes a boot regression on some
 Marvell mvebu platforms.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJS/o3LAAoJEFk3GJrT+8ZlNUEP/1k2zAaEJSB4Lf47jWszKnQ/
 8tyY+2B5mcaziCIA+zVVGsjOpOEt7lBAVmDx2l1TDHe3tmnOAImqzOVseZZ4fEGo
 Yr3md2T4/9KNtBeQLbO2nylc//UCSJoqANikhO8maIn4YIIZJM8IdlTl/GPWyQ3z
 v1M+siIiPcr5jffdKhf201abl0nT7ZAhV0Jd0N3iZ8cvITupfen4VsB8gXGYnHP4
 dR7186+sxCPj3nhhPcVgdYR55MM6I7pASG5/Z0FsrEnH/xcvdcyzK5KoVD/A2eG6
 l6oltrP+WZ3/QGd/wW0nDbHfP5e4edCtzpxIoXq8r8kf8hdnUE9E40JM36NI5xz/
 QNvt/fRXB5eOXTuX3xm8eJ44oyDo6U361oMe0jARwg7ULHeDfbQQ7gfLFsE5ad4p
 Zeebhm5k//CeoeuPKFrrskAODQS4bjMuCvJX0QoHne1zaQ9K4TfGaivZJTUBSrGX
 WqDzpACNcuFDOQJcEZ9r9NvD1z0PY9wwTEYyH1vn5B+N7E0hZypz9gpBZgn4yNNF
 FWYRb467wkuWSWCfIHFjfSp9+SSw9NFx1DVR+SwOk4foMXcZNlQhfXghSZYJhV9t
 lDOva0gmrhYO8RKru81AweMN6ZfaiWdw6xt5UUw/WUjZn6sDnNJwdX7hPCg+4d0e
 NI2H44r2vOcS0gAL4IjV
 =DDj5
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Kevin Hilman:
 "A collection of ARM SoC fixes for v3.14-rc1.

  Mostly a collection of Kconfig, device tree data and compilation fixes
  along with fix to drivers/phy that fixes a boot regression on some
  Marvell mvebu platforms"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  dma: mv_xor: Silence a bunch of LPAE-related warnings
  ARM: ux500: disable msp2 device tree node
  ARM: zynq: Reserve not DMAable space in front of the kernel
  ARM: multi_v7_defconfig: Select CONFIG_SOC_DRA7XX
  ARM: imx6: Initialize low-power mode early again
  ARM: pxa: fix various compilation problems
  ARM: pxa: fix compilation problem on AM300EPD board
  ARM: at91: add Atmel's SAMA5D3 Xplained board
  spi/atmel: document clock properties
  mmc: atmel-mci: document clock properties
  ARM: at91: enable USB host on at91sam9n12ek board
  ARM: at91/dt: fix sama5d3 ohci hclk clock reference
  ARM: at91/dt: sam9263: fix compatibility string for the I2C
  ata: sata_mv: Fix probe failures with optional phys
  drivers: phy: Add support for optional phys
  drivers: phy: Make NULL a valid phy reference
  ARM: fix HAVE_ARM_TWD selection for OMAP and shmobile
  ARM: moxart: move DMA_OF selection to driver
  ARM: hisi: fix kconfig warning on HAVE_ARM_TWD
2014-02-15 15:01:33 -08:00
Linus Torvalds ca033390a5 USB fixes for 3.14-rc3
Here is a bunch of USB fixes for 3.14-rc3.  Most of these are xhci
 reverts, fixing a bunch of reported issues with USB 3 host controller
 issues that loads of people have been hitting (with the exception of
 kernel developers, all of our machines seem to be working fine, which is
 why these took so long to get resolved...)
 
 There are some other minor fixes and new device ids, as ususal.  All
 have been in linux-next successfully.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlL+iUYACgkQMUfUDdst+yl+cACeI7OnpRFgZEAS1yJOGQv0mi/m
 uAEAnAw6tNMgMt+W93eIIEtw9b/dOAp2
 =nQo7
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here is a bunch of USB fixes for 3.14-rc3.  Most of these are xhci
  reverts, fixing a bunch of reported issues with USB 3 host controller
  issues that loads of people have been hitting (with the exception of
  kernel developers, all of our machines seem to be working fine, which
  is why these took so long to get resolved...)

  There are some other minor fixes and new device ids, as ususal.  All
  have been in linux-next successfully"

* tag 'usb-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (22 commits)
  usb: option: blacklist ZTE MF667 net interface
  Revert "usb: xhci: Link TRB must not occur within a USB payload burst"
  Revert "xhci: Avoid infinite loop when sg urb requires too many trbs"
  Revert "xhci: Set scatter-gather limit to avoid failed block writes."
  xhci 1.0: Limit arbitrarily-aligned scatter gather.
  Modpost: fixed USB alias generation for ranges including 0x9 and 0xA
  usb: core: Fix potential memory leak adding dyn USBdevice IDs
  USB: ftdi_sio: add Tagsys RFID Reader IDs
  usb: qcserial: add Netgear Aircard 340U
  usb-storage: enable multi-LUN scanning when needed
  USB: simple: add Dynastream ANT USB-m Stick device support
  usb-storage: add unusual-devs entry for BlackBerry 9000
  usb-storage: restrict bcdDevice range for Super Top in Cypress ATACB
  usb: phy: move some error messages to debug
  usb: ftdi_sio: add Mindstorms EV3 console adapter
  usb: dwc2: fix memory corruption in dwc2 driver
  usb: dwc2: fix role switch breakage
  usb: dwc2: bail out early when booting with "nousb"
  Revert "xhci: replace xhci_read_64() with readq()"
  Revert "xhci: replace xhci_write_64() with writeq()"
  ...
2014-02-14 16:15:45 -08:00
Linus Torvalds bb0a05d756 Char/Misc fixes for 3.14-rc3
Here are some small char/misc driver fixes, along with some
 documentation updates, for 3.14-rc3.  Nothing major, just a number of
 fixes for reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iEYEABECAAYFAlL+hvIACgkQMUfUDdst+ykGRQCgpi/LgypVecwQlv73TsHLNZo/
 ybIAniXJlLgf4xu3q0J+8br3iQ7sg8pC
 =m+up
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char/misc fixes from Greg KH:
 "Here are some small char/misc driver fixes, along with some
  documentation updates, for 3.14-rc3.  Nothing major, just a number of
  fixes for reported issues"

* tag 'char-misc-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Revert "misc: eeprom: sunxi: Add new compatibles"
  Revert "ARM: sunxi: dt: Convert to the new SID compatibles"
  misc: mic: fix possible signed underflow (undefined behavior) in userspace API
  ARM: sunxi: dt: Convert to the new SID compatibles
  misc: eeprom: sunxi: Add new compatibles
  misc: genwqe: Fix potential memory leak when pinning memory
  Documentation:Update Documentation/zh_CN/arm64/memory.txt
  Documentation:Update Documentation/zh_CN/arm64/booting.txt
  Documentation:Chinese translation of Documentation/arm64/tagged-pointers.txt
  raw: set range for MAX_RAW_DEVS
  raw: test against runtime value of max_raw_minors
  Drivers: hv: vmbus: Don't timeout during the initial connection with host
  Drivers: hv: vmbus: Specify the target CPU that should receive notification
  VME: Correct read/write alignment algorithm
  mei: don't unset read cb ptr on reset
  mei: clear write cb from waiting list on reset
2014-02-14 16:13:00 -08:00
Chris Mason 11bcac89c0 Revert "btrfs: add ioctl to export size of global metadata reservation"
This reverts commit 01e219e806.

David Sterba found a different way to provide these features without adding a new
ioctl.  We haven't released any progs with this ioctl yet, so I'm taking this out
for now until we finalize things.

Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: David Sterba <dsterba@suse.cz>
CC: Jeff Mahoney <jeffm@suse.com>
2014-02-14 13:42:13 -08:00
Eric Dumazet 977cb0ecf8 tcp: add pacing_rate information into tcp_info
Add two new fields to struct tcp_info, to report sk_pacing_rate
and sk_max_pacing_rate to monitoring applications, as ss from iproute2.

User exported fields are 64bit, even if kernel is currently using 32bit
fields.

lpaa5:~# ss -i
..
	 skmem:(r0,rb357120,t0,tb2097152,f1584,w1980880,o0,bl0) ts sack cubic
wscale:6,6 rto:400 rtt:0.875/0.75 mss:1448 cwnd:1 ssthresh:12 send
13.2Mbps pacing_rate 3336.2Mbps unacked:15 retrans:1/5448 lost:15
rcv_space:29200

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14 16:09:43 -05:00
WANG Cong 1c213bd24a net: introduce netdev_alloc_pcpu_stats() for drivers
There are many drivers calling alloc_percpu() to allocate pcpu stats
and then initializing ->syncp. So just introduce a helper function for them.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14 15:49:55 -05:00
Linus Torvalds 161aa772f9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "A collection of small fixes:

   - There still seem to be problems with asm goto which requires the
     empty asm hack.
   - If SMAP is disabled at compile time, don't enable it nor try to
     interpret a page fault as an SMAP violation.
   - Fix a case of unbounded recursion while tracing"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, smap: smap_violation() is bogus if CONFIG_X86_SMAP is off
  x86, smap: Don't enable SMAP if CONFIG_X86_SMAP is disabled
  compiler/gcc4: Make quirk for asm_volatile_goto() unconditional
  x86: Use preempt_disable_notrace() in cycles_2_ns()
2014-02-14 11:09:11 -08:00
Linus Torvalds eef445eedc ACPI and power management fixes for 3.14-rc3
- Fix for a recent regression in the intel_pstate driver that
    introduced a race condition causing systems to crash during
    initialization in some situations.  This removes the affected
    code altogether.  From Dirk Brandewie.
 
  - ACPIPHP fix for a regression introduced during the 3.12 cycle
    causing devices to be dropped as a result of bus check notifications
    after system resume on some systems due to the way ACPIPHP interprets
    _STA return values (arguably incorrectly).  From Mika Westerberg.
 
  - ACPI dock driver fix for a problem causing docking to fail due to
    a check that always fails after recent ACPI core changes (found by
    code inspection).
 
  - ACPI container driver fix to prevent memory from being leaked in
    an error code path after device_register() failures.
 
  - Update of the arm_big_little cpufreq driver maintainer's e-mail
    address.
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJS/hR7AAoJEILEb/54YlRxmqQP/j1Ti0Gi1LU2PPBjDpA7jKlM
 icJjEdGO6tAZjJS7qaK6oiXfbjLzdeJSqHGDMmNJGZF4ZXk/DROLkuke6b7SERDf
 C3+KVfF8Z6KrvN/xI1BwvPnlbMsOzR4fWA94opz35YYX8/3VMU31iWNkaI58NY6H
 JjlsOJO6UnIi1iUCIisTs36DQq+htXok/fUs3LenzRnCEyQX7yVVR6lpO4GzqGio
 6o0gLelzSUmjSQ8PDwWw1BWDtek67vRwG9lK7aIVHgXyV7bmFvvCT9dJ68TcDdTo
 vZYsCrVHMq8hhFVjLyBQuYh8cn8SIsZoLQodnzLeStjIm8NvnROKm4wCYqWS4wwX
 TsvwURHhuR7qRNWQgX+dk4TS/+dx//n+CbtBFA6LK4cBWQcgBpXwn49aik4KPVZC
 89R6GVPgRMG+XDNBPSA9CXW0SRjb3W+0R4REQ/Za8B5N1wMIOYHvhOktgREiVtIt
 upyrwe4roeTGAxdhv3ROUcxDrzHaBN/Cu+Y4iSmB3psetJZGfLb+JCmbf+R+PI69
 klvuWhx+arKzFIvkzskR3O3QNpy8sipIl709YLQSJa2TQDoBoosPjhVwSJF6b7D1
 +jQj0+1hutXjLKY0++MBZK6mz5Bp+yRaNQRDrOW2QQaTK4b9v63JGMvPGgQN/jt5
 X8T7iomf3Q/CLzwgENJn
 =fdRt
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:
 "These include a fix for a recent intel_pstate regression, a fix for a
  regression in the ACPI-based PCI hotplug (ACPIPHP) code introduced
  during the 3.12 cycle, fixes for two bugs in the ACPI core introduced
  recently and a MAINTAINERS update related to cpufreq.

  Specifics:

   - Fix for a recent regression in the intel_pstate driver that
     introduced a race condition causing systems to crash during
     initialization in some situations.  This removes the affected code
     altogether.  From Dirk Brandewie.

   - ACPIPHP fix for a regression introduced during the 3.12 cycle
     causing devices to be dropped as a result of bus check
     notifications after system resume on some systems due to the way
     ACPIPHP interprets _STA return values (arguably incorrectly).  From
     Mika Westerberg.

   - ACPI dock driver fix for a problem causing docking to fail due to a
     check that always fails after recent ACPI core changes (found by
     code inspection).

   - ACPI container driver fix to prevent memory from being leaked in an
     error code path after device_register() failures.

   - Update of the arm_big_little cpufreq driver maintainer's e-mail
     address"

* tag 'pm+acpi-3.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  MAINTAINERS / cpufreq: update Sudeep's email address
  intel_pstate: Remove energy reporting from pstate_sample tracepoint
  ACPI / container: Fix error code path in container_device_attach()
  ACPI / hotplug / PCI: Relax the checking of _STA return values
  ACPI / dock: Use acpi_device_enumerated() to check if dock is present
2014-02-14 11:07:29 -08:00
Linus Torvalds 5e57dc8110 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block IO fixes from Jens Axboe:
 "Second round of updates and fixes for 3.14-rc2.  Most of this stuff
  has been queued up for a while.  The notable exception is the blk-mq
  changes, which are naturally a bit more in flux still.

  The pull request contains:

   - Two bug fixes for the new immutable vecs, causing crashes with raid
     or swap.  From Kent.

   - Various blk-mq tweaks and fixes from Christoph.  A fix for
     integrity bio's from Nic.

   - A few bcache fixes from Kent and Darrick Wong.

   - xen-blk{front,back} fixes from David Vrabel, Matt Rushton, Nicolas
     Swenson, and Roger Pau Monne.

   - Fix for a vec miscount with integrity vectors from Martin.

   - Minor annotations or fixes from Masanari Iida and Rashika Kheria.

   - Tweak to null_blk to do more normal FIFO processing of requests
     from Shlomo Pongratz.

   - Elevator switching bypass fix from Tejun.

   - Softlockup in blkdev_issue_discard() fix when !CONFIG_PREEMPT from
     me"

* 'for-linus' of git://git.kernel.dk/linux-block: (31 commits)
  block: add cond_resched() to potentially long running ioctl discard loop
  xen-blkback: init persistent_purge_work work_struct
  blk-mq: pair blk_mq_start_request / blk_mq_requeue_request
  blk-mq: dont assume rq->errors is set when returning an error from ->queue_rq
  block: Fix cloning of discard/write same bios
  block: Fix type mismatch in ssize_t_blk_mq_tag_sysfs_show
  blk-mq: rework flush sequencing logic
  null_blk: use blk_complete_request and blk_mq_complete_request
  virtio_blk: use blk_mq_complete_request
  blk-mq: rework I/O completions
  fs: Add prototype declaration to appropriate header file include/linux/bio.h
  fs: Mark function as static in fs/bio-integrity.c
  block/null_blk: Fix completion processing from LIFO to FIFO
  block: Explicitly handle discard/write same segments
  block: Fix nr_vecs for inline integrity vectors
  blk-mq: Add bio_integrity setup to blk_mq_make_request
  blk-mq: initialize sg_reserved_size
  blk-mq: handle dma_drain_size
  blk-mq: divert __blk_put_request for MQ ops
  blk-mq: support at_head inserations for blk_execute_rq
  ...
2014-02-14 10:45:18 -08:00
Linus Torvalds e847882887 RDMA/InfiniBand fixes for 3.14-rc3:
- Fix some rough edges from the "IP addressing for IBoE" merge
  - Other misc fixes, mostly to hardware drivers
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJS/ld6AAoJEENa44ZhAt0hch4P/3UWikX1tgZF7k0a3TsMdRTZ
 IK2GuKj6RiCpbYRjhNgW8YHtPNlmOYa0t+4kwQiIjTwWo4SgxMt5HG/nx1Js1yuH
 R62YSdZRPkEs65AfHsBMUSZtEp3jNXNdMnbqeNwAb0lnvWRkxxgoIYYc+fQ5mc8K
 gZPHU1oLm7M4jiJhsrPDMxIW/I6sQMCqRCyXdaUmdhQWz62LyMjXfPeK/yfU69Pj
 ujMJtkV46pzY7anhzBuseBd95glcRFzz43KHq5Yl3HNlhaVFbnP0czlTue2Xy/la
 FMvFPBVP3Aha1N3tOtNQOHMeQ5HCY88fXMo6FnFYPKSQqmvwOjY8vsM6h3FoVymN
 oCiPno4hz/E4nsYtG9LsnMANXzvWyR1zTy0KGjysxSdMiat/v66VFvh8KUXmspme
 c+unvJK+pJnXV05ONwF2GVUo3RfhTeGVQoNnZMvq5S/QZd5fdjsEIiZqaXJV4LZU
 pxgBa/6c7tgTp2IJry8ghh49QBMVzyg4p9+/FBR4SDsMXEMc5evyGTSZmMNlXMdV
 XGR5v2iMUa+JtsTvAqsPZdx0blMrEm6GRznfv5tZmqnRtPp90o063GxBnngDIKPH
 5KXo8zqwHyeiEgXCUdxyHGWEOMRCvcqaBEiQ644z9fWPBWLUlLHAVMofoLhvYjdF
 6xseybHKYtxCluhh1Xf8
 =OBaI
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull RDMA/InfiniBand fixes from Roland Dreier:

 - Fix some rough edges from the "IP addressing for IBoE" merge

 - Other misc fixes, mostly to hardware drivers

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (21 commits)
  RDMA/ocrdma: Fix load time panic during GID table init
  RDMA/ocrdma: Fix traffic class shift
  IB/iser: Fix use after free in iser_snd_completion()
  IB/iser: Avoid dereferencing iscsi_iser conn object when not bound to iser connection
  IB/usnic: Fix smatch endianness error
  IB/mlx5: Remove dependency on X86
  mlx5: Add include of <linux/slab.h> because of kzalloc()/kfree() use
  IB/qib: Add missing serdes init sequence
  RDMA/cxgb4: Add missing neigh_release in LE-Workaround path
  IB: Report using RoCE IP based gids in port caps
  IB/mlx4: Build the port IBoE GID table properly under bonding
  IB/mlx4: Do IBoE GID table resets per-port
  IB/mlx4: Do IBoE locking earlier when initializing the GID table
  IB/mlx4: Move rtnl locking to the right place
  IB/mlx4: Make sure GID index 0 is always occupied
  IB/mlx4: Don't allocate range of steerable UD QPs for Ethernet-only device
  RDMA/amso1100: Fix error return code
  RDMA/nes: Fix error return code
  IB/mlx5: Don't set "block multicast loopback" capability
  IB/mlx5: Fix binary compatibility with libmlx5
  ...
2014-02-14 10:33:45 -08:00
Roland Dreier c9459388d8 Merge branches 'cma', 'cxgb4', 'iser', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'usnic' into for-next 2014-02-14 09:49:12 -08:00
Florian Fainelli b560a58c45 net: phy: add Broadcom BCM7xxx internal PHY driver
This patch adds support for the Broadcom BCM7xxx Set Top Box SoCs
internal PHYs. This driver supports the following generation of SoCs:

- BCM7366, BCM7439, BCM7445 (28nm process)
- all 40nm and 65nm (older MIPS-based SoCs)

The PHYs on these SoCs require a bunch of workarounds to operate
correctly, both during configuration time and at suspend/resume time,
the driver handles that for us.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14 00:27:58 -05:00
Florian Fainelli 439d39a9ac net: phy: broadcom: extract register definitions
The Broadcom BCM54xx register definitions are shared between BCM54xx and
BCM7xx internal PHYs for which we are adding support. Extract these
register definitions and put them in include/linux/brcmphy.h for use by
the BCM7xxx internal PHY driver.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14 00:27:58 -05:00
Florian Fainelli fd70f72c66 net: phy: add MoCA PHY type
Some Ethernet MACs are connected to a MoCA PHY which will handle the
low-level job of sending Ethernet frames on the coaxial cable, these
Ethernet MACs need to know about it to be properly configured.
Add a new PHY mode "moca" and update the Device Tree parsing logic to
look for it.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-14 00:27:57 -05:00
Roland Dreier 6ecde51dd7 mlx5: Add include of <linux/slab.h> because of kzalloc()/kfree() use
On some architectures (for example, arm), we don't end up indirectly
pulling in the declaration of kzalloc() and kfree(), and so building
anything that includes <linux/mlx5/driver.h> breaks.  Fix this by adding
an explicit include to get the declaration.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 20:48:02 -08:00
Stanislav Fomichev 45f7435968 tcp: remove unused min_cwnd member of tcp_congestion_ops
Commit 684bad1107 "tcp: use PRR to reduce cwin in CWR state" removed all
calls to min_cwnd, so we can safely remove it.
Also, remove tcp_reno_min_cwnd because it was only used for min_cwnd.

Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 18:22:34 -05:00
David S. Miller 886ab57c84 linux-can-next-for-3.15-20140212
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iEYEABECAAYFAlL7RxcACgkQjTAFq1RaXHNcKACgk4Vj+1KlmWVEin8VPoJBX86m
 RasAn36pSY1OszxypJOF0O1s0qO9ld5K
 =383M
 -----END PGP SIGNATURE-----

Merge tag 'linux-can-next-for-3.15-20140212' of git://gitorious.org/linux-can/linux-can-next

linux-can-next-for-3.15-20140212

Marc Kleine-Budde says:

====================
this is a pull request of eight patches for net-next/master.

Florian Vaussard contributed a series that merged the sja1000 of_platform
into the platform driver. The of_platform driver is finally removed.
Stephane Grosjean supplied a patch to allocate CANFD skbs. In a patch
by Uwe Kleine-König another missing copyright information was added to
a userspace header. And a patch by Yoann DI RUZZA that adds listen only
mode to the at91_can driver.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 18:16:00 -05:00
Moni Shoua b4a26a2728 IB: Report using RoCE IP based gids in port caps
For userspace RoCE UD QPs we need to know the GID format that the
kernel uses, e.g when working over older kernels. For that end, add a
new port capability IB_PORT_IP_BASED_GIDS and report it when query
port is issued.

Signed-off-by: Moni Shoua <monis@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-13 14:46:03 -08:00
Florian Westphal fe6cc55f3a net: ip, ipv6: handle gso skbs in forwarding path
Marcelo Ricardo Leitner reported problems when the forwarding link path
has a lower mtu than the incoming one if the inbound interface supports GRO.

Given:
Host <mtu1500> R1 <mtu1200> R2

Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.

In this case, the kernel will fail to send ICMP fragmentation needed
messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
checks in forward path. Instead, Linux tries to send out packets exceeding
the mtu.

When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
not fragment the packets when forwarding, and again tries to send out
packets exceeding R1-R2 link mtu.

This alters the forwarding dstmtu checks to take the individual gso
segment lengths into account.

For ipv6, we send out pkt too big error for gso if the individual
segments are too big.

For ipv4, we either send icmp fragmentation needed, or, if the DF bit
is not set, perform software segmentation and let the output path
create fragments when the packet is leaving the machine.
It is not 100% correct as the error message will contain the headers of
the GRO skb instead of the original/segmented one, but it seems to
work fine in my (limited) tests.

Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
sofware segmentation.

However it turns out that skb_segment() assumes skb nr_frags is related
to mss size so we would BUG there.  I don't want to mess with it considering
Herbert and Eric disagree on what the correct behavior should be.

Hannes Frederic Sowa notes that when we would shrink gso_size
skb_segment would then also need to deal with the case where
SKB_MAX_FRAGS would be exceeded.

This uses sofware segmentation in the forward path when we hit ipv4
non-DF packets and the outgoing link mtu is too small.  Its not perfect,
but given the lack of bug reports wrt. GRO fwd being broken this is a
rare case anyway.  Also its not like this could not be improved later
once the dust settles.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-02-13 17:17:02 -05:00