Commit Graph

693356 Commits

Author SHA1 Message Date
Florian Westphal 3282e65558 tcp: remove unused mib counters
was used by tcp prequeue and header prediction.
TCPFORWARDRETRANS use was removed in january.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 14:37:50 -07:00
Florian Westphal 573aeb0492 tcp: remove CA_ACK_SLOWPATH
re-indent tcp_ack, and remove CA_ACK_SLOWPATH; it is always set now.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 14:37:50 -07:00
Florian Westphal 45f119bf93 tcp: remove header prediction
Like prequeue, I am not sure this is overly useful nowadays.

If we receive a train of packets, GRO will aggregate them if the
headers are the same (HP predates GRO by several years) so we don't
get a per-packet benefit, only a per-aggregated-packet one.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 14:37:49 -07:00
Florian Westphal b6690b1438 tcp: remove low_latency sysctl
Was only checked by the removed prequeue code.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 14:37:49 -07:00
Florian Westphal c13ee2a4f0 tcp: reindent two spots after prequeue removal
These two branches are now always true, remove the conditional.
objdiff shows no changes.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 14:37:49 -07:00
Florian Westphal e7942d0633 tcp: remove prequeue support
prequeue is a tcp receive optimization that moves part of rx processing
from bh to process context.

This only works if the socket being processed belongs to a process that
is blocked in recv on that socket.

In practice, this doesn't happen anymore that often because nowadays
servers tend to use an event driven (epoll) model.

Even normal client applications (web browsers) commonly use many tcp
connections in parallel.

This has measureable impact only in netperf (which uses plain recv and
thus allows prequeue use) from host to locally running vm (~4%), however,
there were no changes when using netperf between two physical hosts with
ixgbe interfaces.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-31 14:37:49 -07:00
Linus Torvalds 2e7ca2064c Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Several cgroup bug fixes.

   - cgroup core was calling a migration callback on empty migrations,
     which could make cpuset crash.

   - There was a very subtle bug where the controller interface files
     aren't created directly when cgroup2 is mounted. Because later
     operations create them, this bug didn't get noticed earlier.

   - Failed writes to cgroup.subtree_control were incorrectly returning
     zero"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix error return value from cgroup_subtree_control()
  cgroup: create dfl_root files on subsys registration
  cgroup: don't call migration methods if there are no tasks to migrate
2017-07-31 14:03:05 -07:00
Linus Torvalds ff2620f778 Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fixes from Tejun Heo:
 "Two notable fixes.

   - While adding NUMA affinity support to unbound workqueues, the
     assumption that an unbound workqueue with max_active == 1 is
     ordered was broken.

     The plan was to use explicit alloc_ordered_workqueue() for those
     cases. Unfortunately, I forgot to update the documentation properly
     and we grew a handful of use cases which depend on that assumption.

     While we want to convert them to alloc_ordered_workqueue(), we
     don't really lose anything by enforcing ordered execution on
     unbound max_active == 1 workqueues and it doesn't make sense to
     risk subtle bugs. Restore the assumption.

   - Workqueue assumes that CPU <-> NUMA node mapping remains static.

     This is a general assumption - we don't have any synchronization
     mechanism around CPU <-> node mapping. Unfortunately, powerpc may
     change the mapping dynamically leading to crashes. Michael added a
     workaround so that we at least don't crash while powerpc hotplug
     code gets updated"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Work around edge cases for calc of pool's cpumask
  workqueue: implicit ordered attribute should be overridable
  workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
2017-07-31 13:37:28 -07:00
Linus Torvalds 3dcc4c7d42 Merge branch 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
 "Dan found a really old bug where libata hotplug code wasn't sanitizing
  index value from userland and may end up indexing with a negative
  number. It is scary but fortunately can only be triggered by root.

  Other than that, minor fixes"

* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  libata: fix a couple of doc build warnings
  libata: array underflow in ata_find_dev()
  ata: sata_rcar: add gen[23] fallback compatibility strings
  libata: remove unused rc in ata_eh_handle_port_resume
  libata: Cleanup ata_read_log_page()
  ata: fix gemini Kconfig dependencies
2017-07-31 13:33:21 -07:00
Sylwester Nawrocki 5b30850bd6 clk: samsung: exynos5420: The EPLL rate table corrections
This patch fixes values of the EPLL K coefficient and changes
the EPLL output frequency values to match exactly what is
possible to achieve with given M, P, S, K coefficients.
This allows to avoid rounding errors and unexpected frequency
being set with clk_set_rate(), due to recalc_rate returning
different values than the PLL rate specified in the
exynos5420_epll_24mhz_tbl table. E.g. this prevents a case
where two consecutive clk_set_rate() calls with same argument
result in different PLL output frequency.

The PLL output frequencies have been calculated with formula:

f = fxtal * (M * 2^16 + K) / (P * 2^S) / 2^16

where fxtal = 24000000.

Fixes: 9842452acd ("clk: samsung: exynos542x: Add EPLL rate table")
Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2017-07-31 13:16:03 -07:00
Babu Moger 74ad3d28af parisc: Define CONFIG_CPU_BIG_ENDIAN
While working on enabling queued rwlock on SPARC, found this following
code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN
to clear a byte.

static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
 {
	return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
 }

Problem is many of the fixed big endian architectures don't define
CPU_BIG_ENDIAN and clears the wrong byte.

Define CPU_BIG_ENDIAN for parisc architecture to fix it.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Signed-off-by: Helge Deller <deller@gmx.de>
2017-07-31 17:51:27 +02:00
Jonathan Corbet 2f60e1ab2f libata: fix a couple of doc build warnings
The kerneldoc comments for a couple of functions in drivers/ata/libata-eh.c
had fallen behind the current implementation, resulting in these doc build
warnings:

  ./drivers/ata/libata-eh.c:1449: warning: No description found for parameter 'link'
  ./drivers/ata/libata-eh.c:1449: warning: Excess function parameter 'ap' description in 'ata_eh_done'
  ./drivers/ata/libata-eh.c:1590: warning: No description found for parameter 'qc'
  ./drivers/ata/libata-eh.c:1590: warning: Excess function parameter 'dev' description in 'ata_eh_request_sense'

Update the comments and make the warnings go away.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-07-31 08:03:06 -07:00
James Bottomley 93964fd4ea parisc: pdc_stable: Fix locking when creating sysfs links
There's no need to take the write lock when creating sysfs links.

This patch fixes the following BUG:
 BUG: sleeping function called from invalid context at mm/slab.h:416
 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc2-00110-g0b5477d9dabd #111
 Backtrace:
 [<0000000040217ac8>] show_stack+0x20/0x38
 [<00000000406fbbb0>] dump_stack+0xb0/0x128
 [<0000000040274090>] ___might_sleep+0x180/0x1b8
 [<0000000040274144>] __might_sleep+0x7c/0xe8
 [<0000000040373874>] kmem_cache_alloc+0x14c/0x1e0
 [<0000000040419514>] __kernfs_new_node+0x84/0x1b8
 [<000000004041b09c>] kernfs_new_node+0x3c/0x78
 [<000000004041e040>] kernfs_create_link+0x40/0xd8
 [<000000004041f320>] sysfs_do_create_link_sd.isra.0+0xb0/0x130
 [<000000004041f3d4>] sysfs_create_link+0x34/0x58
 [<000000004011b4a4>] pdc_stable_init+0x2c4/0x458
 [<0000000040200250>] do_one_initcall+0x70/0x1d8
 [<0000000040101644>] kernel_init_freeable+0x27c/0x390
 [<000000004020be44>] kernel_init+0x24/0x1c0

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: Helge Deller <deller@gmx.de>
2017-07-31 16:43:13 +02:00
Axel Lin fa8f6d0619 gpio: lp87565: Set proper output level and direction for direction_output
The value argument of lp87565_gpio_direction_output() means output level
rather than gpio direction.

Signed-off-by: Axel Lin <axel.lin@ingics.com>
Reviewed-by: Keerthy <j-keerthy@ti.com>
Tested-by: Keerthy <j-keerthy@ti.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-07-31 15:26:57 +02:00
Rafael J. Wysocki a684c5b188 thunderbolt: icm: Ignore mailbox errors in icm_suspend()
On one of my test machines nhi_mailbox_cmd() called from icm_suspend()
times out and returnes an error which then is propagated to the
caller and causes the entire system suspend to be aborted which isn't
very useful.

Instead of aborting system suspend, print the error into the log
and continue.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Michael Jamet <michael.jamet@intel.com>
2017-07-31 13:24:29 +02:00
Loic Poulain fb776481c4 Bluetooth: hci_uart: Fix uninitialized alignment value
Force alignment value to the default one (1 byte) if uninitialized.
This fixes hci_ll serdev driver (alignment = 0) and avoid any further
issues with upcoming drivers.

Signed-off-by: Loic Poulain <loic.poulain@gmail.com>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2017-07-31 13:27:37 +03:00
Nicholas Piggin cc491f1d35 powerpc/64s: Fix stack setup in watchdog soft_nmi_common()
The watchdog soft-NMI exception stack setup loads a stack pointer
twice, which is an obvious error. It ends up using the system reset
interrupt (true-NMI) stack, which is also a bug because the watchdog
could be preempted by a system reset interrupt that overwrites the
NMI stack.

Change the soft-NMI to use the "emergency stack". The current kernel
stack is not used, because of the longer-term goal to prevent
asynchronous stack access using soft-disable.

Fixes: 2104180a53 ("powerpc/64s: implement arch-specific hardlockup watchdog")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-07-31 20:22:37 +10:00
Michael Ellerman bb272221e9 Linux v4.13-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZapWhAAoJEHm+PkMAQRiGKb0IAJM6b7SbWaw69Og7+qiFB+zZ
 xp29iXqbE9fPISC6a5BRQV1ONjeDM6opGixGHqGC8Hla6k2IYz25VDNoF8wd0MXN
 cz/Ih20vd3C5afxXGe5cTT8lsPAlV0mWXxForlu6j8jPeL62FPfq6RhEkw7AcrYL
 yfYy3k3qSdOrrvBdII0WAAUi46UfIs+we9BQgbsMbkHOiqV2K0MOrzKE84Xbgepq
 RAy2xg6P4b4+hTx8xTrYc1MXwpnqjRc0oJ08gdmiwW3AOOU7LxYFn7zDkLPWi9Rr
 g4x6r4YhBTGxT4wNvovLIiqd9QFs//dMCuPWYwEtTICG48umIqqq24beQ0mvCdg=
 =08Ic
 -----END PGP SIGNATURE-----

Merge tag 'v4.13-rc1' into fixes

The fixes branch is based off a random pre-rc1 commit, because we had
some fixes that needed to go in before rc1 was released.

However we now need to fix some code that went in after that point, but
before rc1, so merge rc1 to get that code into fixes so we can fix it!
2017-07-31 20:20:29 +10:00
Kuppuswamy Sathyanarayanan 727fd697da MAINTAINERS: Add entry for Whiskey Cove PMIC GPIO driver
Added maintainer info for Whiskey Cove PMIC GPIO driver.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-07-31 09:13:52 +02:00
Helge Deller 8f8201dfed parisc: Increase thread and stack size to 32kb
Since kernel 4.11 the thread and irq stacks on parisc randomly overflow
the default size of 16k. The reason why stack usage suddenly grew is yet
unknown.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Helge Deller <deller@gmx.de>
2017-07-31 08:41:26 +02:00
John David Anglin 13d57093c1 parisc: Handle vma's whose context is not current in flush_cache_range
In testing James' patch to drivers/parisc/pdc_stable.c, I hit the BUG
statement in flush_cache_range() during a system shutdown:

kernel BUG at arch/parisc/kernel/cache.c:595!
CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
Workqueue: events free_ioctx

 IAOQ[0]: flush_cache_range+0x144/0x148
 IAOQ[1]: flush_cache_page+0x0/0x1a8
 RP(r2): flush_cache_range+0xec/0x148
Backtrace:
 [<00000000402910ac>] unmap_page_range+0x84/0x880
 [<00000000402918f4>] unmap_single_vma+0x4c/0x60
 [<0000000040291a18>] zap_page_range_single+0x110/0x160
 [<0000000040291c34>] unmap_mapping_range+0x174/0x1a8
 [<000000004026ccd8>] truncate_pagecache+0x50/0xa8
 [<000000004026cd84>] truncate_setsize+0x54/0x70
 [<000000004033d534>] put_aio_ring_file+0x44/0xb0
 [<000000004033d5d8>] aio_free_ring+0x38/0x140
 [<000000004033d714>] free_ioctx+0x34/0xa8
 [<00000000401b0028>] process_one_work+0x1b8/0x4d0
 [<00000000401b04f4>] worker_thread+0x1b4/0x648
 [<00000000401b9128>] kthread+0x1b0/0x208
 [<0000000040150020>] end_fault_vector+0x20/0x28
 [<0000000040639518>] nf_ip_reroute+0x50/0xa8
 [<0000000040638ed0>] nf_ip_route+0x10/0x78
 [<0000000040638c90>] xfrm4_mode_tunnel_input+0x180/0x1f8

CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
Workqueue: events free_ioctx
Backtrace:
 [<0000000040163bf0>] show_stack+0x20/0x38
 [<0000000040688480>] dump_stack+0xa8/0x120
 [<0000000040163dc4>] die_if_kernel+0x19c/0x2b0
 [<0000000040164d0c>] handle_interruption+0xa24/0xa48

This patch modifies flush_cache_range() to handle non current contexts.
In as much as this occurs infrequently, the simplest approach is to
flush the entire cache when this happens.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Cc: stable@vger.kernel.org # 4.9+
Signed-off-by: Helge Deller <deller@gmx.de>
2017-07-31 08:22:33 +02:00
Jeff Layton 9c5d58fb9e ext4: convert swap_inode_data() over to use swap() on most of the fields
For some odd reason, it forces a byte-by-byte copy of each field. A
plain old swap() on most of these fields would be more efficient. We
do need to retain the memswap of i_data however as that field is an array.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
2017-07-31 00:55:34 -04:00
Emoly Liu 191eac3300 ext4: error should be cleared if ea_inode isn't added to the cache
For Lustre, if ea_inode fails in hash validation but passes parent
inode and generation checks, it won't be added to the cache as well
as the error "-EFSCORRUPTED" should be cleared, otherwise it will
cause "Structure needs cleaning" when running getfattr command.

Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-9723

Cc: stable@vger.kernel.org
Fixes: dec214d00e
Signed-off-by: Emoly Liu <emoly.liu@intel.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: tahsin@google.com
2017-07-31 00:40:22 -04:00
Jan Kara a3bb2d5587 ext4: Don't clear SGID when inheriting ACLs
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0'). However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of
__ext4_set_acl() into ext4_set_acl(). That way the function will not be
called when inheriting ACLs which is what we want as it prevents SGID
bit clearing and the mode has been properly set by posix_acl_create()
anyway.

Fixes: 073931017b
CC: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
2017-07-30 23:33:01 -04:00
Ernesto A. Fernández 397e434176 ext4: preserve i_mode if __ext4_set_acl() fails
When changing a file's acl mask, __ext4_set_acl() will first set the group
bits of i_mode to the value of the mask, and only then set the actual
extended attribute representing the new acl.

If the second part fails (due to lack of space, for example) and the file
had no acl attribute to begin with, the system will from now on assume
that the mask permission bits are actual group permission bits, potentially
granting access to the wrong users.

Prevent this by only changing the inode mode after the acl has been set.

Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2017-07-30 22:43:41 -04:00
Eric Whitney a627b0a7c1 ext4: remove unused metadata accounting variables
Two variables in ext4_inode_info, i_reserved_meta_blocks and
i_allocated_meta_blocks, are unused.  Removing them saves a little
memory per in-memory inode and cleans up clutter in several tracepoints.
Adjust tracepoint output from ext4_alloc_da_blocks() for consistency
and fix a typo and whitespace near these changes.

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
2017-07-30 22:30:11 -04:00
David S. Miller 764646b08d Merge branch 'net-sched-actions-improve-dump-performance'
Jamal Hadi Salim says:

====================
net sched actions: improve dump performance

Changes since v11:
------------------
1) Jiri - renames: nla_value to value and nla_selector to selector
2) Jiri - rename: validate_nla_bitfield_32 to validate_nla_bitfield_32
3) Jiri - rename: NLA_BITFIELD_32 to NLA_BITFIELD32
4) Jiri - remove unnecessary break when we return in case statement
5) Jiri - rename and move nla_get_bitfield_32 to an earlier patch
6) Jiri - xmas tree alignment of var declaration
7) Jiri - rename all declarations of bitfield 32 vars to be consistent ("bf")
8) Jiri - improve validate_nla_bitfield32() validation to disallow valid
          bit values that are not selected by the selector

Changes since v10:
-----------------
1) Jiri: move type->validate_content() to its own patch
Jamal: decided to remove it altogether so we can get this patch set in.

2) Change name of NLA_FLAG_BITS to NLA_BITFIELD_32 based on discussions
with D. Ahern and Jiri. D. Ahern suggests to make this a variable bitmap size.
My analysis at this point is it too complex and i only need a few bit
flags. If we run out of bits someone else can create a new NLA_BITFIELD_XXX
and start using that. So please let this go.

3) Jamal - Add Suggested-by: Jiri for type NLA_BITFIELD_32

4) Jiri: Change name allowed_flags to tcaa_root_flags_allowed

5) Jiri: Introduce nla_get_flag_bits_values() helper instead of using
memcpy for retrieving nla_bitfield_32 fields.

Changes since v9:
-----------------

1) General consensus:
- remove again the use of BIT() to maintain uapi consistency ;->

1) Jiri:
- Add a new netlink type NLA_FLAG_BITS to check for valid bits
  and use it instead of inline vetting (patch 4/4 now)

Changes since v8:
-----------------

1) Jiri:
- Add back the use of BIT(). Eventually fix iproute2 instead
- Rename VALID_TCA_FLAGS to VALID_TCA_ROOT_FLAGS

Changes since v7:
-----------------

Jamal:
No changes.
Patch 1 went out twice. Resend without two copies of patch 1

changes since v6:
-----------------

1) DaveM:
New rules for netlink messages. From now on we are going to start
checking for bits that are not used and rejecting anything we dont
understand. In the future this is going to require major changes
to user space code (tc etc). This is just a start.

To quote, David:
"
 Again, bits you aren't using now, make sure userspace doesn't
   set them.  And if it does, reject.
"
Added checks for ensuring things work as above.

2) Jiri:
a)Fix the commit message to properly use "Fixes" description
b)Align assignments for nla_policy

Changes since v5:
----------------

0)
Remove use of BIT() because it is kernel specific. Requires a separate
patch (Jiri can submit that in his cleanups)

1)To paraphrase Eric D.

"memcpy(nla_data(count_attr), &cb->args[1], sizeof(u32));
wont work on 64bit BE machines because cb->args[1]
(which is 64 bit is larger in size than sizeof(u32))"

Fixed

2) Jiri Pirko

i) Spotted a bug fix mixed in the patch for wrong TLV
fix. Add patch 1/3 to address this. Make part of this
series because of dependencies.

ii) Rename ACT_LARGE_DUMP_ON -> TCA_FLAG_LARGE_DUMP_ON

iii) Satisfy Jiri's obsession against the noun "tcaa"
a)Rename struct nlattr *tcaa --> struct nlattr *tb
b)Rename TCAA_ACT_XXX -> TCA_ROOT_XXX

Changes since v4:
-----------------

1) Eric D.

pointed out that when all skb space is used up by the dump
there will be no space to insert the TCAA_ACT_COUNT attribute.

2) Jiri:

i) Change:

enum {
        TCAA_UNSPEC,
        TCAA_ACT_TAB,
        TCAA_ACT_FLAGS,
        TCAA_ACT_COUNT,
        TCAA_ACT_TIME_FILTER,
        __TCAA_MAX
};

to:
enum {
       TCAA_UNSPEC,
       TCAA_ACT_TAB,
       TCAA_ACT_FLAGS,
       TCAA_ACT_COUNT,
       __TCAA_MAX,
};

Jiri plans to followup with the rest of the code to make the
style consistent.

ii) Rename attribute TCAA_ACT_TIME_FILTER --> TCAA_ACT_TIME_DELTA

iii) Rename variable jiffy_filter --> jiffy_since
iv) Rename msecs_filter --> msecs_since
v) get rid of unused cb->args[0] and rename cb->args[4] to cb->args[0]

Earlier Changes
----------------
- Jiri mostly on names of things.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30 19:28:08 -07:00
Jamal Hadi Salim e62e484df0 net sched actions: add time filter for action dumping
This patch adds support for filtering based on time since last used.
When we are dumping a large number of actions it is useful to
have the option of filtering based on when the action was last
used to reduce the amount of data crossing to user space.

With this patch the user space app sets the TCA_ROOT_TIME_DELTA
attribute with the value in milliseconds with "time of interest
since now".  The kernel converts this to jiffies and does the
filtering comparison matching entries that have seen activity
since then and returns them to user space.
Old kernels and old tc continue to work in legacy mode since
they dont specify this attribute.

Some example (we have 400 actions bound to 400 filters); at
installation time. Using updated when tc setting the time of
interest to 120 seconds earlier (we see 400 actions):
prompt$ hackedtc actions ls action gact since 120000| grep index | wc -l
400

go get some coffee and wait for > 120 seconds and try again:

prompt$ hackedtc actions ls action gact since 120000 | grep index | wc -l
0

Lets see a filter bound to one of these actions:
....
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 2 success 1)
  match 7f000002/ffffffff at 12 (success 1 )
    action order 1: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1145 sec used 802 sec
    Action statistics:
    Sent 84 bytes 1 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
....

that coffee took long, no? It was good.

Now lets ping -c 1 127.0.0.2, then run the actions again:
prompt$ hackedtc actions ls action gact since 120 | grep index | wc -l
1

More details please:
prompt$ hackedtc -s actions ls action gact since 120000

    action order 0: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1270 sec used 30 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

And the filter?

filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:10  (rule hit 4 success 2)
  match 7f000002/ffffffff at 12 (success 2 )
    action order 1: gact action pass
     random type none pass val 0
     index 23 ref 2 bind 1 installed 1324 sec used 84 sec
    Action statistics:
    Sent 168 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30 19:28:08 -07:00
Jamal Hadi Salim 90825b23a8 net sched actions: dump more than TCA_ACT_MAX_PRIO actions per batch
When you dump hundreds of thousands of actions, getting only 32 per
dump batch even when the socket buffer and memory allocations allow
is inefficient.

With this change, the user will get as many as possibly fitting
within the given constraints available to the kernel.

The top level action TLV space is extended. An attribute
TCA_ROOT_FLAGS is used to carry flags; flag TCA_FLAG_LARGE_DUMP_ON
is set by the user indicating the user is capable of processing
these large dumps. Older user space which doesnt set this flag
doesnt get the large (than 32) batches.
The kernel uses the TCA_ROOT_COUNT attribute to tell the user how many
actions are put in a single batch. As such user space app knows how long
to iterate (independent of the type of action being dumped)
instead of hardcoded maximum of 32 thus maintaining backward compat.

Some results dumping 1.5M actions below:
first an unpatched tc which doesnt understand these features...

prompt$ time -p tc actions ls action gact | grep index | wc -l
1500000
real 1388.43
user 2.07
sys 1386.79

Now lets see a patched tc which sets the correct flags when requesting
a dump:

prompt$ time -p updatedtc actions ls action gact | grep index | wc -l
1500000
real 178.13
user 2.02
sys 176.96

That is about 8x performance improvement for tc app which sets its
receive buffer to about 32K.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30 19:28:08 -07:00
Jamal Hadi Salim df823b0297 net sched actions: Use proper root attribute table for actions
Bug fix for an issue which has been around for about a decade.
We got away with it because the enumeration was larger than needed.

Fixes: 7ba699c604 ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API")
Suggested-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30 19:28:08 -07:00
Jamal Hadi Salim 64c83d8373 net netlink: Add new type NLA_BITFIELD32
Generic bitflags attribute content sent to the kernel by user.
With this netlink attr type the user can either set or unset a
flag in the kernel.

The value is a bitmap that defines the bit values being set
The selector is a bitmask that defines which value bit is to be
considered.

A check is made to ensure the rules that a kernel subsystem always
conforms to bitflags the kernel already knows about. i.e
if the user tries to set a bit flag that is not understood then
the _it will be rejected_.

In the most basic form, the user specifies the attribute policy as:
[ATTR_GOO] = { .type = NLA_BITFIELD32, .validation_data = &myvalidflags },

where myvalidflags is the bit mask of the flags the kernel understands.

If the user _does not_ provide myvalidflags then the attribute will
also be rejected.

Examples:
value = 0x0, and selector = 0x1
implies we are selecting bit 1 and we want to set its value to 0.

value = 0x2, and selector = 0x2
implies we are selecting bit 2 and we want to set its value to 1.

Suggested-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30 19:28:08 -07:00
Eric Whitney 1e21196c8e ext4: correct comment references to ext4_ext_direct_IO()
Commit 914f82a32d "ext4: refactor direct IO code" deleted
ext4_ext_direct_IO(), but references to that function remain in
comments.  Update them to refer to ext4_direct_IO_write().

Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
2017-07-30 22:26:40 -04:00
Andrew Lunn fbbeefdd21 net: fec: Allow reception of frames bigger than 1522 bytes
The FEC Receive Control Register has a 14 bit field indicating the
longest frame that may be received. It is being set to 1522. Frames
longer than this are discarded, but counted as being in error.

When using DSA, frames from the switch has an additional header,
either 4 or 8 bytes if a Marvell switch is used. Thus a full MTU frame
of 1522 bytes received by the switch on a port becomes 1530 bytes when
passed to the host via the FEC interface.

Change the maximum receive size to 2048 - 64, where 64 is the maximum
rx_alignment applied on the receive buffer for AVB capable FEC
cores. Use this value also for the maximum receive buffer size. The
driver is already allocating a receive SKB of 2048 bytes, so this
change should not have any significant effects.

Tested on imx51, imx6, vf610.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30 19:26:01 -07:00
Andrew Lunn 9558df3a82 net: fec: Issue error for missing but expected PHY
If the PHY is missing but expected, e.g. because of a typ0 in the dt
file, it is not possible to open the interface. ip link returns:

RTNETLINK answers: No such device

It is not very obvious what the problem is. Add a netdev_err() in this
case to make it easier to debug the issue.

[   21.409385] fec 2188000.ethernet eth0: Unable to connect to phy
RTNETLINK answers: No such device

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Fugang Duan <fugang.duan@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30 19:25:22 -07:00
David S. Miller 509394e841 Merge branch 'dsa-lan9303-Fix-MDIO-issues'
Egil Hjelmeland says:

====================
net: dsa: lan9303: Fix MDIO issues.

This series fix the MDIO interface for the lan9303 DSA driver.
Bugs found after testing on actual HW.

This series is extracted from the first patch of my first large
series. Significant changes from that version are:
 - use mdiobus_write_nested, mdiobus_read_nested.
 - EXPORT lan9303_indirect_phy_ops

Unfortunately I do not have access to i2c based system for
testing.

Changes from first version:
 - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30 19:23:29 -07:00
Egil Hjelmeland 2c3408986c net: dsa: lan9303: MDIO access phy registers directly
Indirect access (PMI) to phy register only work in I2C mode. In
MDIO mode phy registers must be accessed directly. Introduced
struct lan9303_phy_ops to handle the two modes.

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30 19:23:29 -07:00
Egil Hjelmeland 9e866e5dab net: dsa: lan9303: Renamed indirect phy access functions
Preparing for the following fix of MDIO phy access:

Renamed functions that access PHY 1 and 2 indirectly through PMI
registers.

 lan9303_port_phy_reg_wait_for_completion() to
 lan9303_indirect_phy_wait_for_completion()

 lan9303_port_phy_reg_read() to
 lan9303_indirect_phy_read()

 lan9303_port_phy_reg_write() to
 lan9303_indirect_phy_write()

Also changed "val" parameter of lan9303_indirect_phy_write() to u16,
for clarity.

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30 19:23:29 -07:00
Egil Hjelmeland ab78acb152 net: dsa: lan9303: Multiply by 4 to get MDIO register
lan9303_mdio_write()/_read() must multiply register number by 4 to get
offset.

Added some commments to the register definitions.

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30 19:23:29 -07:00
Egil Hjelmeland d329ac88eb net: dsa: lan9303: Fix lan9303_detect_phy_setup() for MDIO
Handle that MDIO read with no response return 0xffff.

Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-30 19:23:29 -07:00
Linus Torvalds 16f73eb02d Linux 4.13-rc3 2017-07-30 12:40:36 -07:00
Linus Torvalds f137e0b0c5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A small set of x86 fixes:

   - prevent the kernel from using the EFI reboot method when EFI is
     disabled.

   - two patches addressing clang issues"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Disable the address-of-packed-member compiler warning
  x86/efi: Fix reboot_mode when EFI runtime services are disabled
  x86/boot: #undef memcpy() et al in string.c
2017-07-30 12:19:35 -07:00
Linus Torvalds e4776b8ccb Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Thomas Gleixner:
 "Two patches addressing build warnings caused by inconsistent kernel
  doc comments"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/wait: Clean up some documentation warnings
  sched/core: Fix some documentation build warnings
2017-07-30 11:54:08 -07:00
Linus Torvalds dbc52a8030 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "A couple of fixes for performance counters and kprobes:

   - a series of small patches which make the uncore performance
     counters on Skylake server systems work correctly

   - add a missing instruction slot release to the failure path of
     kprobes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kprobes/x86: Release insn_slot in failure path
  perf/x86/intel/uncore: Fix missing marker for skx_uncore_cha_extra_regs
  perf/x86/intel/uncore: Fix SKX CHA event extra regs
  perf/x86/intel/uncore: Remove invalid Skylake server CHA filter field
  perf/x86/intel/uncore: Fix Skylake server CHA LLC_LOOKUP event umask
  perf/x86/intel/uncore: Fix Skylake server PCU PMU event format
  perf/x86/intel/uncore: Fix Skylake UPI PMU event masks
2017-07-30 11:52:15 -07:00
Linus Torvalds 06efc7df37 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
 "Fix for a regression caused by the conversion of x86 to the generic
  hotplug code.

  Instead of doing a plain single line revert, this adds a pile of
  comments so the semantics of the force argument are clear"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/cpuhotplug: Revert "Set force affinity flag on hotplug migration"
2017-07-30 11:27:33 -07:00
Hanjun Guo f7f3dd5b4c ACPI: APD: Fix HID for Hisilicon Hip07/08
ACPI HID for Hisilicon Hip07/08 should be HISI02A1/2,
not HISI0A21/2, HISI02A1/2 was tested ok but was modified
by the stupid typo when upstream the patches (by me),
correct them to the right IDs (matching the IDs in
drivers/i2c/busses/i2c-designware-platdrv.c).

Fixes: 6e14cf361a (ACPI / APD: Add clock frequency for Hisilicon Hip07/08 I2C controller)
Reported-by: Tao Tian <tiantao6@huawei.com>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-30 14:33:48 +02:00
Rafael J. Wysocki 4815d3c56d cpufreq: x86: Make scaling_cur_freq behave more as expected
After commit f8475cef90 "x86: use common aperfmperf_khz_on_cpu() to
calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute
in sysfs only behaves as expected on x86 with APERF/MPERF registers
available when it is read from at least twice in a row.  The value
returned by the first read may not be meaningful, because the
computations in there use cached values from the previous iteration
of aperfmperf_snapshot_khz() which may be stale.

To prevent that from happening, modify arch_freq_get_on_cpu() to
call aperfmperf_snapshot_khz() twice, with a short delay between
these calls, if the previous invocation of aperfmperf_snapshot_khz()
was too far back in the past (specifically, more that 1s ago).

Also, as pointed out by Doug Smythies, aperf_delta is limited now
and the multiplication of it by cpu_khz won't overflow, so simplify
the s->khz computations too.

Fixes: f8475cef90 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF"
Reported-by: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-07-30 14:26:51 +02:00
Daniel Borkmann 9975a54b3c bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
bpf_prog_size(prog->len) is not the correct length we want to dump
back to user space. The code in bpf_prog_get_info_by_fd() uses this
to copy prog->insnsi to user space, but bpf_prog_size(prog->len) also
includes the size of struct bpf_prog itself plus program instructions
and is usually used either in context of accounting or for bpf_prog_alloc()
et al, thus we copy out of bounds in bpf_prog_get_info_by_fd()
potentially. Use the correct bpf_prog_insn_size() instead.

Fixes: 1e27097690 ("bpf: Add BPF_OBJ_GET_INFO_BY_FD")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 23:29:41 -07:00
Arnd Bergmann efe967cdec tcp: avoid bogus gcc-7 array-bounds warning
When using CONFIG_UBSAN_SANITIZE_ALL, the TCP code produces a
false-positive warning:

net/ipv4/tcp_output.c: In function 'tcp_connect':
net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
   tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
                                        ^~
net/ipv4/tcp_output.c:2207:40: error: array subscript is below array bounds [-Werror=array-bounds]
   tp->chrono_stat[tp->chrono_type - 1] += now - tp->chrono_start;
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~

I have opened a gcc bug for this, but distros have already shipped
compilers with this problem, and it's not clear yet whether there is
a way for gcc to avoid the warning. As the problem is related to the
bitfield access, this introduces a temporary variable to store the old
enum value.

I did not notice this warning earlier, since UBSAN is disabled when
building with COMPILE_TEST, and that was always turned on in both
allmodconfig and randconfig tests.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81601
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 23:26:29 -07:00
David S. Miller 736b9b9c50 Merge branch 'ethtool-fec'
Roopa Prabhu says:

====================
ethtool: support for forward error correction mode setting on a link

Forward Error Correction (FEC) modes i.e Base-R
and Reed-Solomon modes are introduced in 25G/40G/100G standards
for providing good BER at high speeds. Various networking devices
which support 25G/40G/100G provides ability to manage supported FEC
modes and the lack of FEC encoding control and reporting today is a
source for interoperability issues for many vendors.
FEC capability as well as specific FEC mode i.e. Base-R
or RS modes can be requested or advertised through bits D44:47 of base link
codeword.

This patch set intends to provide option under ethtool to manage and
report FEC encoding settings for networking devices as per IEEE 802.3
bj, bm and by specs.

v2 :
        - minor patch format fixes and typos pointed out by Andrew
        - there was a pending discussion on the use of 'auto' vs
          'automatic' for fec settings. I have left it as 'auto'
          because in most cases today auto is used in place of
          automatic to represent automatically generated values.
          We use it in other networking config too. I would prefer
          leaving it as auto.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 23:23:45 -07:00
Casey Leedom 7fece840e3 cxgb4: ethtool forward error correction management support
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-29 23:23:44 -07:00