Commit Graph

901967 Commits

Author SHA1 Message Date
Christian Brauner e6dee9f389 net-sysfs: add netdev_change_owner()
Add a function to change the owner of a network device when it is moved
between network namespaces.

Currently, when moving network devices between network namespaces the
ownership of the corresponding sysfs entries is not changed. This leads
to problems when tools try to operate on the corresponding sysfs files.
This leads to a bug whereby a network device that is created in a
network namespaces owned by a user namespace will have its corresponding
sysfs entry owned by the root user of the corresponding user namespace.
If such a network device has to be moved back to the host network
namespace the permissions will still be set to the user namespaces. This
means unprivileged users can e.g. trigger uevents for such incorrectly
owned devices. They can also modify the settings of the device itself.
Both of these things are unwanted.

For example, workloads will create network devices in the host network
namespace. Other tools will then proceed to move such devices between
network namespaces owner by other user namespaces. While the ownership
of the device itself is updated in
net/core/net-sysfs.c:dev_change_net_namespace() the corresponding sysfs
entry for the device is not:

drwxr-xr-x 5 nobody nobody    0 Jan 25 18:08 .
drwxr-xr-x 9 nobody nobody    0 Jan 25 18:08 ..
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_assign_type
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_len
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 address
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 broadcast
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_changes
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_down_count
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_up_count
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_id
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_port
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dormant
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 duplex
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 flags
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 gro_flush_timeout
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifalias
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifindex
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 iflink
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 link_mode
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 mtu
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 name_assign_type
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 netdev_group
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 operstate
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_id
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_name
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_switch_id
drwxr-xr-x 2 nobody nobody    0 Jan 25 18:09 power
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 proto_down
drwxr-xr-x 4 nobody nobody    0 Jan 25 18:09 queues
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 speed
drwxr-xr-x 2 nobody nobody    0 Jan 25 18:09 statistics
lrwxrwxrwx 1 nobody nobody    0 Jan 25 18:08 subsystem -> ../../../../class/net
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 tx_queue_len
-r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 type
-rw-r--r-- 1 nobody nobody 4096 Jan 25 18:08 uevent

However, if a device is created directly in the network namespace then
the device's sysfs permissions will be correctly updated:

drwxr-xr-x 5 root   root      0 Jan 25 18:12 .
drwxr-xr-x 9 nobody nobody    0 Jan 25 18:08 ..
-r--r--r-- 1 root   root   4096 Jan 25 18:12 addr_assign_type
-r--r--r-- 1 root   root   4096 Jan 25 18:12 addr_len
-r--r--r-- 1 root   root   4096 Jan 25 18:12 address
-r--r--r-- 1 root   root   4096 Jan 25 18:12 broadcast
-rw-r--r-- 1 root   root   4096 Jan 25 18:12 carrier
-r--r--r-- 1 root   root   4096 Jan 25 18:12 carrier_changes
-r--r--r-- 1 root   root   4096 Jan 25 18:12 carrier_down_count
-r--r--r-- 1 root   root   4096 Jan 25 18:12 carrier_up_count
-r--r--r-- 1 root   root   4096 Jan 25 18:12 dev_id
-r--r--r-- 1 root   root   4096 Jan 25 18:12 dev_port
-r--r--r-- 1 root   root   4096 Jan 25 18:12 dormant
-r--r--r-- 1 root   root   4096 Jan 25 18:12 duplex
-rw-r--r-- 1 root   root   4096 Jan 25 18:12 flags
-rw-r--r-- 1 root   root   4096 Jan 25 18:12 gro_flush_timeout
-rw-r--r-- 1 root   root   4096 Jan 25 18:12 ifalias
-r--r--r-- 1 root   root   4096 Jan 25 18:12 ifindex
-r--r--r-- 1 root   root   4096 Jan 25 18:12 iflink
-r--r--r-- 1 root   root   4096 Jan 25 18:12 link_mode
-rw-r--r-- 1 root   root   4096 Jan 25 18:12 mtu
-r--r--r-- 1 root   root   4096 Jan 25 18:12 name_assign_type
-rw-r--r-- 1 root   root   4096 Jan 25 18:12 netdev_group
-r--r--r-- 1 root   root   4096 Jan 25 18:12 operstate
-r--r--r-- 1 root   root   4096 Jan 25 18:12 phys_port_id
-r--r--r-- 1 root   root   4096 Jan 25 18:12 phys_port_name
-r--r--r-- 1 root   root   4096 Jan 25 18:12 phys_switch_id
drwxr-xr-x 2 root   root      0 Jan 25 18:12 power
-rw-r--r-- 1 root   root   4096 Jan 25 18:12 proto_down
drwxr-xr-x 4 root   root      0 Jan 25 18:12 queues
-r--r--r-- 1 root   root   4096 Jan 25 18:12 speed
drwxr-xr-x 2 root   root      0 Jan 25 18:12 statistics
lrwxrwxrwx 1 nobody nobody    0 Jan 25 18:12 subsystem -> ../../../../class/net
-rw-r--r-- 1 root   root   4096 Jan 25 18:12 tx_queue_len
-r--r--r-- 1 root   root   4096 Jan 25 18:12 type
-rw-r--r-- 1 root   root   4096 Jan 25 18:12 uevent

Now, when creating a network device in a network namespace owned by a
user namespace and moving it to the host the permissions will be set to
the id that the user namespace root user has been mapped to on the host
leading to all sorts of permission issues:

458752
drwxr-xr-x 5 458752 458752      0 Jan 25 18:12 .
drwxr-xr-x 9 root   root        0 Jan 25 18:08 ..
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 addr_assign_type
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 addr_len
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 address
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 broadcast
-rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier_changes
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier_down_count
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 carrier_up_count
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 dev_id
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 dev_port
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 dormant
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 duplex
-rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 flags
-rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 gro_flush_timeout
-rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 ifalias
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 ifindex
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 iflink
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 link_mode
-rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 mtu
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 name_assign_type
-rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 netdev_group
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 operstate
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 phys_port_id
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 phys_port_name
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 phys_switch_id
drwxr-xr-x 2 458752 458752      0 Jan 25 18:12 power
-rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 proto_down
drwxr-xr-x 4 458752 458752      0 Jan 25 18:12 queues
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 speed
drwxr-xr-x 2 458752 458752      0 Jan 25 18:12 statistics
lrwxrwxrwx 1 root   root        0 Jan 25 18:12 subsystem -> ../../../../class/net
-rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 tx_queue_len
-r--r--r-- 1 458752 458752   4096 Jan 25 18:12 type
-rw-r--r-- 1 458752 458752   4096 Jan 25 18:12 uevent

Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 20:07:25 -08:00
Christian Brauner 3b52fc5d78 drivers/base/power: add dpm_sysfs_change_owner()
Add a helper to change the owner of a device's power entries. This
needs to happen when the ownership of a device is changed, e.g. when
moving network devices between network namespaces.
This function will be used to correctly account for ownership changes,
e.g. when moving network devices between network namespaces.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 20:07:25 -08:00
Christian Brauner b8f33e5d76 device: add device_change_owner()
Add a helper to change the owner of a device's sysfs entries. This
needs to happen when the ownership of a device is changed, e.g. when
moving network devices between network namespaces.
This function will be used to correctly account for ownership changes,
e.g. when moving network devices between network namespaces.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 20:07:25 -08:00
Christian Brauner 2c4f9401ce sysfs: add sysfs_change_owner()
Add a helper to change the owner of sysfs objects.
This function will be used to correctly account for kobject ownership
changes, e.g. when moving network devices between network namespaces.

This mirrors how a kobject is added through driver core which in its guts is
done via kobject_add_internal() which in summary creates the main directory via
create_dir(), populates that directory with the groups associated with the
ktype of the kobject (if any) and populates the directory with the basic
attributes associated with the ktype of the kobject (if any). These are the
basic steps that are associated with adding a kobject in sysfs.
Any additional properties are added by the specific subsystem itself (not by
driver core) after it has registered the device. So for the example of network
devices, a network device will e.g. register a queue subdirectory under the
basic sysfs directory for the network device and than further subdirectories
within that queues subdirectory.  But that is all specific to network devices
and they call the corresponding sysfs functions to do that directly when they
create those queue objects. So anything that a subsystem adds outside of what
driver core does must also be changed by it (That's already true for removal of
files it created outside of driver core.) and it's the same for ownership
changes.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 20:07:25 -08:00
Christian Brauner 303a42769c sysfs: add sysfs_group{s}_change_owner()
Add helpers to change the owner of sysfs groups.
This function will be used to correctly account for kobject ownership
changes, e.g. when moving network devices between network namespaces.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 20:07:25 -08:00
Christian Brauner 0666a3aee7 sysfs: add sysfs_link_change_owner()
Add a helper to change the owner of a sysfs link.
This function will be used to correctly account for kobject ownership
changes, e.g. when moving network devices between network namespaces.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 20:07:25 -08:00
Christian Brauner f70ce18568 sysfs: add sysfs_file_change_owner()
Add helpers to change the owner of a sysfs files.
This function will be used to correctly account for kobject ownership
changes, e.g. when moving network devices between network namespaces.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 20:07:25 -08:00
Nicolas Saenz Julienne 402482a6a7 net: bcmgenet: Clear ID_MODE_DIS in EXT_RGMII_OOB_CTRL when not needed
Outdated Raspberry Pi 4 firmware might configure the external PHY as
rgmii although the kernel currently sets it as rgmii-rxid. This makes
connections unreliable as ID_MODE_DIS is left enabled. To avoid this,
explicitly clear that bit whenever we don't need it.

Fixes: da38802211 ("net: bcmgenet: Add RGMII_RXID support")
Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 17:12:30 -08:00
Jiri Pirko 1521a67e60 sched: act: count in the size of action flags bitfield
The put of the flags was added by the commit referenced in fixes tag,
however the size of the message was not extended accordingly.

Fix this by adding size of the flags bitfield to the message size.

Fixes: e382267860 ("net: sched: update action implementations to support flags")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 17:10:44 -08:00
Masahiro Yamada eabc8bcb29 kbuild: get rid of trailing slash from subdir- example
obj-* needs a trailing slash for a directory, but subdir-* does not.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-02-27 10:03:27 +09:00
Madhuparna Bhowmik 2eb51c75dc net: core: devlink.c: Use built-in RCU list checking
list_for_each_entry_rcu() has built-in RCU and lock checking.

Pass cond argument to list_for_each_entry_rcu() to silence
false lockdep warning when CONFIG_PROVE_RCU_LIST is enabled.

The devlink->lock is held when devlink_dpipe_table_find()
is called in non RCU read side section. Therefore, pass struct devlink
to devlink_dpipe_table_find() for lockdep checking.

Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 16:59:18 -08:00
Gustavo A. R. Silva d1c73cbdf9 net: cisco: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

Lastly, fix the following checkpatch warning:
CHECK: Prefer kernel type 'u32' over 'u_int32_t'
#61: FILE: drivers/net/ethernet/cisco/enic/vnic_devcmd.h:653:
+	u_int32_t val[];

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 16:43:09 -08:00
Gustavo A. R. Silva 274ac2831a net: marvell: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 16:43:09 -08:00
Gustavo A. R. Silva c5d6cf903f net: hns: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 16:43:09 -08:00
Gustavo A. R. Silva 62f1914251 sfc: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Acked-by: Martin Habets <mhabets@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 16:43:09 -08:00
Gustavo A. R. Silva 4a34d825b8 qlogic: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

This issue was detected with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 7649773293 ("cxgb3/l2t: Fix undefined behaviour")

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 16:43:09 -08:00
Florian Fainelli 98c5f7d44f net: dsa: bcm_sf2: Forcibly configure IMP port for 1Gb/sec
We are still experiencing some packet loss with the existing advanced
congestion buffering (ACB) settings with the IMP port configured for
2Gb/sec, so revert to conservative link speeds that do not produce
packet loss until this is resolved.

Fixes: 8f1880cbe8 ("net: dsa: bcm_sf2: Configure IMP port for 2Gb/sec")
Fixes: de34d7084e ("net: dsa: bcm_sf2: Only 7278 supports 2Gb/sec IMP port")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 16:38:23 -08:00
Florian Fainelli 3f02735e5d Revert "net: dsa: bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278"
This reverts commit 7458bd540f ("net: dsa:
bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278") as it causes
advanced congestion buffering issues with 7278 switch devices when using
their internal Giabit PHY. While this is being debugged, continue with
conservative defaults that work and do not cause packet loss.

Fixes: 7458bd540f ("net: dsa: bcm_sf2: Also configure Port 5 for 2Gb/sec on 7278")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 16:33:35 -08:00
David S. Miller 574b238f64 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes:

1) Perform garbage collection from workqueue to fix rcu detected
   stall in ipset hash set types, from Jozsef Kadlecsik.

2) Fix the forceadd evaluation path, also from Jozsef.

3) Fix nft_set_pipapo selftest, from Stefano Brivio.

4) Crash when add-flush-add element in pipapo set, also from Stefano.
   Add test to cover this crash.

5) Remove sysctl entry under mutex in hashlimit, from Cong Wang.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 16:30:17 -08:00
Linus Torvalds bfdc6d91a2 platform/chrome fixes for v5.6-rc4
Includes this commit:
 platform/chrome: wilco_ec: Include asm/unaligned instead of linux/ path
 
 Fixes a compilation warning.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQCtZK6p/AktxXfkOlzbaomhzOwwgUCXlblyAAKCRBzbaomhzOw
 wt4ZAP9/aN/gan3EUCrJvFW0z2uZbHrWfblFSUNKH2//4KPjjAEAmp6HKG4b/D73
 7eeJCq9cVzCJoYL2MVhwpIwB2oxoDg4=
 =f5X+
 -----END PGP SIGNATURE-----

Merge tag 'tag-chrome-platform-fixes-for-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

Pull chrome platform fix from Benson Leung:
 "Fix a build warning"

* tag 'tag-chrome-platform-fixes-for-v5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
  platform/chrome: wilco_ec: Include asm/unaligned instead of linux/ path
2020-02-26 15:54:52 -08:00
Jonathan Lemon 9a005c3898 bnxt_en: add newline to netdev_*() format strings
Add missing newlines to netdev_* format strings so the lines
aren't buffered by the printk subsystem.

Nitpicked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Acked-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 15:52:33 -08:00
Cong Wang 99b79c3900 netfilter: xt_hashlimit: unregister proc file before releasing mutex
Before releasing the global mutex, we only unlink the hashtable
from the hash list, its proc file is still not unregistered at
this point. So syzbot could trigger a race condition where a
parallel htable_create() could register the same file immediately
after the mutex is released.

Move htable_remove_proc_entry() back to mutex protection to
fix this. And, fold htable_destroy() into htable_put() to make
the code slightly easier to understand.

Reported-and-tested-by: syzbot+d195fd3b9a364ddd6731@syzkaller.appspotmail.com
Fixes: c4a3922d2d ("netfilter: xt_hashlimit: reduce hashlimit_mutex scope for htable_put()")
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26 23:25:07 +01:00
Michal Kubecek e34f1753ee ethtool: limit bitset size
Syzbot reported that ethnl_compact_sanity_checks() can be tricked into
reading past the end of ETHTOOL_A_BITSET_VALUE and ETHTOOL_A_BITSET_MASK
attributes and even the message by passing a value between (u32)(-31)
and (u32)(-1) as ETHTOOL_A_BITSET_SIZE.

The problem is that DIV_ROUND_UP(attr_nbits, 32) is 0 for such values so
that zero length ETHTOOL_A_BITSET_VALUE will pass the length check but
ethnl_bitmap32_not_zero() check would try to access up to 512 MB of
attribute "payload".

Prevent this overflow byt limiting the bitset size. Technically, compact
bitset format would allow bitset sizes up to almost 2^18 (so that the
nest size does not exceed U16_MAX) but bitsets used by ethtool are much
shorter. S16_MAX, the largest value which can be directly used as an
upper limit in policy, should be a reasonable compromise.

Fixes: 10b518d4e6 ("ethtool: netlink bitset handling")
Reported-by: syzbot+7fd4ed5b4234ab1fdccd@syzkaller.appspotmail.com
Reported-by: syzbot+709b7a64d57978247e44@syzkaller.appspotmail.com
Reported-by: syzbot+983cb8fb2d17a7af549d@syzkaller.appspotmail.com
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 11:27:31 -08:00
Amritha Nambiar 6e11d1578f net: Fix Tx hash bound checking
Fixes the lower and upper bounds when there are multiple TCs and
traffic is on the the same TC on the same device.

The lower bound is represented by 'qoffset' and the upper limit for
hash value is 'qcount + qoffset'. This gives a clean Rx to Tx queue
mapping when there are multiple TCs, as the queue indices for upper TCs
will be offset by 'qoffset'.

v2: Fixed commit description based on comments.

Fixes: 1b837d489e ("net: Revoke export for __skb_tx_hash, update it to just be static skb_tx_hash")
Fixes: eadec877ce ("net: Add support for subordinate traffic classes to netdev_pick_tx")
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 11:14:10 -08:00
Linus Torvalds 91ad64a84e Tracing updates:
Change in API of bootconfig (before it comes live in a release)
   - Have a magic value "BOOTCONFIG" in initrd to know a bootconfig exists
   - Set CONFIG_BOOT_CONFIG to 'n' by default
   - Show error if "bootconfig" on cmdline but not compiled in
   - Prevent redefining the same value
   - Have a way to append values
   - Added a SELECT BLK_DEV_INITRD to fix a build failure
 
  Synthetic event fixes:
   - Switch to raw_smp_processor_id() for recording CPU value in preempt
     section. (No care for what the value actually is)
   - Fix samples always recording u64 values
   - Fix endianess
   - Check number of values matches number of fields
   - Fix a printing bug
 
  Fix of trace_printk() breaking postponed start up tests
 
  Make a function static that is only used in a single file.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXlW4vxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtioAP0WLEm3dWO0z3321h/a0DSshC+Bslu3
 HDPTsGVGrXmvggEA/lr1ikRHd8PsO7zW8BfaZMxoXaTqXiuSrzEWxnMlFw0=
 =O8PM
 -----END PGP SIGNATURE-----

Merge tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing and bootconfig updates:
 "Fixes and changes to bootconfig before it goes live in a release.

  Change in API of bootconfig (before it comes live in a release):
  - Have a magic value "BOOTCONFIG" in initrd to know a bootconfig
    exists
  - Set CONFIG_BOOT_CONFIG to 'n' by default
  - Show error if "bootconfig" on cmdline but not compiled in
  - Prevent redefining the same value
  - Have a way to append values
  - Added a SELECT BLK_DEV_INITRD to fix a build failure

  Synthetic event fixes:
  - Switch to raw_smp_processor_id() for recording CPU value in preempt
    section. (No care for what the value actually is)
  - Fix samples always recording u64 values
  - Fix endianess
  - Check number of values matches number of fields
  - Fix a printing bug

  Fix of trace_printk() breaking postponed start up tests

  Make a function static that is only used in a single file"

* tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue
  bootconfig: Add append value operator support
  bootconfig: Prohibit re-defining value on same key
  bootconfig: Print array as multiple commands for legacy command line
  bootconfig: Reject subkey and value on same parent key
  tools/bootconfig: Remove unneeded error message silencer
  bootconfig: Add bootconfig magic word for indicating bootconfig explicitly
  bootconfig: Set CONFIG_BOOT_CONFIG=n by default
  tracing: Clear trace_state when starting trace
  bootconfig: Mark boot_config_checksum() static
  tracing: Disable trace_printk() on post poned tests
  tracing: Have synthetic event test use raw_smp_processor_id()
  tracing: Fix number printing bug in print_synth_event()
  tracing: Check that number of vals matches number of synth event fields
  tracing: Make synth_event trace functions endian-correct
  tracing: Make sure synth_event_trace() example always uses u64
2020-02-26 10:34:42 -08:00
Linus Torvalds b98cce1ef5 linux-kselftest-kunit-5.6-rc4
This Kselftest kunit update consists of fixes to documentation and
 run-time tool from Brendan Higgins and Heidi Fahim.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl5VuwUACgkQCwJExA0N
 QxwJiw/+OgVUhIVw4GNvuyDfRruZBR77h41brG3yIlkiJeswxrJBvv6mgQWP69nu
 3V2MO7DrJ/Y4LINZ4ElGyiSMpoY+Tpex7GBX0WZy31FVrmOAd4AhZ/fHZar1k4ye
 7rnts9Py6PwIYVxO3hcuDAfpIhEa98qKTKhVrLfHxR2CxbcvKDXIWfvz1gcp5M3y
 n4D3KVXwmb6yy7q85l8VjwxXevdaFp/bGmRW5HwzpMPJkrtBJWQrFJBGxeX1LVTY
 IcNKGu61Efd2KP6K9WF6EyS/seD+GbyuFOMq9xOG3WM6f65EILq6K6A24EGZtUxV
 IpJySFvewf+in8lzQql6F0flCvThYXkf2Dofi3yoQAda0XrwcL+Z/rugeLMQoEHN
 bYgCKzwW/otwLpJHlWJLPxEnWfuY7A1025xG7Ly+k7qBVsKy2aMZk70gP9uPr6hh
 lCp+zRRrnMAwFgKNSD6hVC+yblw0ACXv0UmL+ccUtX5KtSa+yYJ3JFZhOFzhhHug
 vwXCF5eLYdGuBVNWAO39kyLyV02nUwXiNaoVW5NF9fNpq6HdA6XWcofcV70AM6WZ
 l3s2MDBq7hc7edYknnTHCgaFlHqIlWkFAm828HtJXBV3IpHAagPRFWUVWnkfPlU9
 FCQXfnbkteB2ZUlHQwjUGBZzh07ZV0iafzNZcYzgyFCjDlVeHDw=
 =Q7Zl
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-kunit-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kunit fixes from Shuah Khan:
 "This Kselftest kunit update consists of fixes to documentation and
  the run-time tool from Brendan Higgins and Heidi Fahim"

* tag 'linux-kselftest-kunit-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  kunit: run kunit_tool from any directory
  kunit: test: Improve error messages for kunit_tool when kunitconfig is invalid
  Documentation: kunit: fixed sphinx error in code block
2020-02-26 10:28:59 -08:00
Linus Torvalds 2fcc74178f linux-kselftest-5.6-rc4
This Kselftest update for Linux 5.6-rc4 consists of:
 
 -  fixes to TIMEOUT failures and out-of-tree compilation compilation
    errors from Michael Ellerman.
 - Declutter git status fix from Christophe Leroy
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl5VlisACgkQCwJExA0N
 QxzxAhAAol+8YeyQNqkesjUUPZR+hc7fM1G3TfHlwar5ljhlwbIOFCtjp66b9EKA
 4Cxy5s2/Vhkbs6CFJPa78UXRoH1enMejff6Dd5njwwNmS+cE1wAatM8RBSJeB4X3
 hMjfXCwvjJXqNhayD8n+sHmpEVtCL8SmiG5kKfQu6s+qXN/4EEUw1AaUfms4WO9t
 VDDC8Cc8RKhl9ZM1YxZTMoS7xISoWeZM94+aK12kXfL/rlt86k0FcN1FoApf/kIo
 15ILTo4cZvWMCLdDxbpw6RSGSdB9+siNFNnWnVp5ytTaD8nVRjLSf/sHlu5B9dvh
 VHPA56lofJmXjMxz/cNoHP2jgVsu+hNuG8J3h/GYkaCd6mEG8f5k7kAdqJjQ1D1/
 3cA54DtxCxfmDji24bTJaD5+uG60NAAh1EjeNKiWkMK07zsUxzXqDgJLLUM67EFk
 cYYwTcT9Yqc/GKVV7e2BkiwOiIYQih0NTg2ugV2HEdmm/1EqycoS0McwzIAIa5+2
 k6iUQ3nlpjLnP7vz4950aLVD9a5CsrRM9dY+ngYcbaAX00g9s0G0sLVfRXW6Ls2t
 9KMYoio1ERILqwvkHgdDyEXGUW/uMYhVMpbx647ZjtRAVNSVTvxZe4jIewZ3o6lx
 6vJ+sxYrrXoyZPPUrQGq3NiHg3Wh8BDw5EZaXuuo8JHbVCpvrMk=
 =QRUz
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull Kselftest fixes from Shuah Khan:

 - fixes to TIMEOUT failures and out-of-tree compilation compilation
   errors from Michael Ellerman.

 - declutter git status fix from Christophe Leroy

* tag 'linux-kselftest-5.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/rseq: Fix out-of-tree compilation
  selftests: Install settings files to fix TIMEOUT failures
  selftest/lkdtm: Don't pollute 'git status'
2020-02-26 10:06:56 -08:00
Christoph Hellwig cfe2ce49b9 Revert "KVM: x86: enable -Werror"
This reverts commit ead68df94d.

Using the -Werror flag breaks the build for me due to mostly harmless
KASAN or similar warnings:

  arch/x86/kvm/x86.c: In function ‘kvm_timer_init’:
  arch/x86/kvm/x86.c:7209:1: error: the frame size of 1112 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

Feel free to add a CONFIG_WERROR if you care strong enough, but don't
break peoples builds for absolutely no good reason.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-26 09:59:58 -08:00
Linus Torvalds fda31c5029 signal: avoid double atomic counter increments for user accounting
When queueing a signal, we increment both the users count of pending
signals (for RLIMIT_SIGPENDING tracking) and we increment the refcount
of the user struct itself (because we keep a reference to the user in
the signal structure in order to correctly account for it when freeing).

That turns out to be fairly expensive, because both of them are atomic
updates, and particularly under extreme signal handling pressure on big
machines, you can get a lot of cache contention on the user struct.
That can then cause horrid cacheline ping-pong when you do these
multiple accesses.

So change the reference counting to only pin the user for the _first_
pending signal, and to unpin it when the last pending signal is
dequeued.  That means that when a user sees a lot of concurrent signal
queuing - which is the only situation when this matters - the only
atomic access needed is generally the 'sigpending' count update.

This was noticed because of a particularly odd timing artifact on a
dual-socket 96C/192T Cascade Lake platform: when you get into bad
contention, on that machine for some reason seems to be much worse when
the contention happens in the upper 32-byte half of the cacheline.

As a result, the kernel test robot will-it-scale 'signal1' benchmark had
an odd performance regression simply due to random alignment of the
'struct user_struct' (and pointed to a completely unrelated and
apparently nonsensical commit for the regression).

Avoiding the double increments (and decrements on the dequeueing side,
of course) makes for much less contention and hugely improved
performance on that will-it-scale microbenchmark.

Quoting Feng Tang:

 "It makes a big difference, that the performance score is tripled! bump
  from original 17000 to 54000. Also the gap between 5.0-rc6 and
  5.0-rc6+Jiri's patch is reduced to around 2%"

[ The "2% gap" is the odd cacheline placement difference on that
  platform: under the extreme contention case, the effect of which half
  of the cacheline was hot was 5%, so with the reduced contention the
  odd timing artifact is reduced too ]

It does help in the non-contended case too, but is not nearly as
noticeable.

Reported-and-tested-by: Feng Tang <feng.tang@intel.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Huang, Ying <ying.huang@intel.com>
Cc: Philip Li <philip.li@intel.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-02-26 09:54:03 -08:00
Daniel Borkmann 4bc988464b Merge branch 'bpf-bpftool-probes'
Michal Rostecki says:

====================
Feature probes in bpftool related to bpf_probe_write_user and
bpf_trace_printk helpers emit dmesg warnings which might be confusing
for people running bpftool on production environments. This patch series
addresses that by filtering them out by default and introducing the new
positional argument "full" which enables all available probes.

The main motivation behind those changes is ability the fact that some
probes (for example those related to "trace" or "write_user" helpers)
emit dmesg messages which might be confusing for people who are running
on production environments. For details see the Cilium issue[0].

v1 -> v2:
- Do not expose regex filters to users, keep filtering logic internal,
expose only the "full" option for including probes which emit dmesg
warnings.

v2 -> v3:
- Do not use regex for filtering out probes, use function IDs directly.
- Fix bash completion - in v2 only "prefix" was proposed after "macros",
  "dev" and "kernel" were not.
- Rephrase the man page paragraph, highlight helper function names.
- Remove tests which parse the plain output of bpftool (except the
  header/macros test), focus on testing JSON output instead.
- Add test which compares the output with and without "full" option.

v3 -> v4:
- Use enum to check for helper functions.
- Make selftests compatible with older versions of Python 3.x than 3.7.

  [0] https://github.com/cilium/cilium/issues/10048
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2020-02-26 18:34:39 +01:00
Michal Rostecki 736332740e selftests/bpf: Add test for "bpftool feature" command
Add Python module with tests for "bpftool feature" command, which mainly
checks whether the "full" option is working properly.

Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200226165941.6379-6-mrostecki@opensuse.org
2020-02-26 18:34:34 +01:00
Michal Rostecki ad92b12a6e bpftool: Update bash completion for "bpftool feature" command
Update bash completion for "bpftool feature" command with the new
argument: "full".

Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200226165941.6379-5-mrostecki@opensuse.org
2020-02-26 18:34:34 +01:00
Michal Rostecki bcdacab6e7 bpftool: Update documentation of "bpftool feature" command
Update documentation of "bpftool feature" command with information about
new arguments: "full".

Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200226165941.6379-4-mrostecki@opensuse.org
2020-02-26 18:34:34 +01:00
Michal Rostecki 368cb0e7cd bpftool: Make probes which emit dmesg warnings optional
Probes related to bpf_probe_write_user and bpf_trace_printk helpers emit
dmesg warnings which might be confusing for people running bpftool on
production environments. This change filters them out by default and
introduces the new positional argument "full" which enables all
available probes.

Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200226165941.6379-3-mrostecki@opensuse.org
2020-02-26 18:34:34 +01:00
Michal Rostecki 6b52ca44e8 bpftool: Move out sections to separate functions
Remove all calls of print_end_then_start_section function and for loops
out from the do_probe function. Instead, provide separate functions for
each section (like i.e. section_helpers) which are called in do_probe.
This change is motivated by better readability.

Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20200226165941.6379-2-mrostecki@opensuse.org
2020-02-26 18:34:34 +01:00
Masahiro Yamada c473a8d03e kbuild: add dt_binding_check to PHONY in a correct place
The dt_binding_check is added to PHONY, but it is invisible when
$(dtstree) is empty. So, it is not specified as phony for
ARCH=x86 etc.

Add it to PHONY outside the ifneq ... endif block.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
2020-02-27 02:28:15 +09:00
Masahiro Yamada 964a596db8 kbuild: add dtbs_check to PHONY
The dtbs_check should be a phony target, but currently it is not
specified so.

'make dtbs_check' works even if a file named 'dtbs_check' exists
because it depends on another phony target, scripts_dtc, but we
should not rely on it.

Add dtbs_check to PHONY.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
2020-02-27 02:26:01 +09:00
Masahiro Yamada fd63fab48f kbuild: remove unneeded semicolon at the end of cmd_dtb_check
This trailing semicolon is unneeded.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
2020-02-27 02:24:34 +09:00
Masahiro Yamada 7a04960560 kbuild: fix DT binding schema rule to detect command line changes
This if_change_rule is not working properly; it cannot detect any
command line change.

The reason is because cmd-check in scripts/Kbuild.include compares
$(cmd_$@) and $(cmd_$1), but cmd_dtc_dt_yaml does not exist here.

For if_change_rule to work properly, the stem part of cmd_* and rule_*
must match. Because this cmd_and_fixdep invokes cmd_dtc, this rule must
be named rule_dtc.

Fixes: 4f0e3a57d6 ("kbuild: Add support for DT binding schema checks")
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Rob Herring <robh@kernel.org>
2020-02-27 02:23:12 +09:00
Masahiro Yamada eccbde4f6c kbuild: remove wrong documentation about mandatory-y
This sentence does not make sense in the section about mandatory-y.

This seems to be a copy-paste mistake of commit fcc8487d47 ("uapi:
export all headers under uapi directories").

The correct description would be "The convention is to list one
mandatory-y per line ...".

I just removed it instead of fixing it. If such information is needed,
it could be commented in include/asm-generic/Kbuild and
include/uapi/asm-generic/Kbuild.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-02-27 02:19:30 +09:00
Randy Dunlap 505b12b386 kbuild: add comment for V=2 mode
Complete the comments for valid values of KBUILD_VERBOSE,
specifically for KBUILD_VERBOSE=2.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-02-27 02:19:07 +09:00
Jiri Pirko bb0858d8bc iavf: use tc_cls_can_offload_and_chain0() instead of chain check
Looks like the iavf code actually experienced a race condition, when a
developer took code before the check for chain 0 was put to helper.
So use tc_cls_can_offload_and_chain0() helper instead of direct check and
move the check to _cb() so this is similar to i40e code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-26 09:01:19 -08:00
Andrey Ignatov 3494bec0f6 bpftool: Support struct_ops, tracing, ext prog types
Add support for prog types that were added to kernel but not present in
bpftool yet: struct_ops, tracing, ext prog types and corresponding
section names.

Before:
  # bpftool p l
  ...
  184: type 26  name test_subprog3  tag dda135a7dc0daf54  gpl
          loaded_at 2020-02-25T13:28:33-0800  uid 0
          xlated 112B  jited 103B  memlock 4096B  map_ids 136
          btf_id 85
  185: type 28  name new_get_skb_len  tag d2de5b87d8e5dc49  gpl
          loaded_at 2020-02-25T13:28:33-0800  uid 0
          xlated 72B  jited 69B  memlock 4096B  map_ids 136
          btf_id 85

After:
  # bpftool p l
  ...
  184: tracing  name test_subprog3  tag dda135a7dc0daf54  gpl
          loaded_at 2020-02-25T13:28:33-0800  uid 0
          xlated 112B  jited 103B  memlock 4096B  map_ids 136
          btf_id 85
  185: ext  name new_get_skb_len  tag d2de5b87d8e5dc49  gpl
          loaded_at 2020-02-25T13:28:33-0800  uid 0
          xlated 72B  jited 69B  memlock 4096B  map_ids 136
          btf_id 85

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225223441.689109-1-rdna@fb.com
2020-02-26 16:40:53 +01:00
Scott Branden 3cd046f182 scripts/bpf: Switch to more portable python3 shebang
Change "/usr/bin/python3" to "/usr/bin/env python3" for
more portable solution in bpf_helpers_doc.py.

Signed-off-by: Scott Branden <scott.branden@broadcom.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20200225205426.6975-1-scott.branden@broadcom.com
2020-02-26 15:42:21 +01:00
Stefano Brivio 0954df70fb selftests: nft_concat_range: Add test for reported add/flush/add issue
Add a specific test for the crash reported by Phil Sutter and addressed
in the previous patch. The test cases that, in my intention, should
have covered these cases, that is, the ones from the 'concurrency'
section, don't run these sequences tightly enough and spectacularly
failed to catch this.

While at it, define a convenient way to add these kind of tests, by
adding a "reported issues" test section.

It's more convenient, for this particular test, to execute the set
setup in its own function. However, future test cases like this one
might need to call setup functions, and will typically need no tools
other than nft, so allow for this in check_tools().

The original form of the reproducer used here was provided by Phil.

Reported-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26 14:33:09 +01:00
Stefano Brivio 212d58c106 nft_set_pipapo: Actually fetch key data in nft_pipapo_remove()
Phil reports that adding elements, flushing and re-adding them
right away:

  nft add table t '{ set s { type ipv4_addr . inet_service; flags interval; }; }'
  nft add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }'
  nft flush set t s
  nft add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }'

triggers, almost reliably, a crash like this one:

  [   71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI
  [   71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e1d861 #192
  [   71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014
  [   71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
  [   71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
  [   71.326164] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
  [   71.328423] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
  [   71.329225] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
  [   71.330365] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
  [   71.331473] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
  [   71.332627] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
  [   71.333615] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
  [   71.334596] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
  [   71.335780] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   71.336577] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
  [   71.337533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [   71.338557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [   71.339718] Call Trace:
  [   71.340093]  nft_pipapo_destroy+0x7a/0x170 [nf_tables_set]
  [   71.340973]  nft_set_destroy+0x20/0x50 [nf_tables]
  [   71.341879]  nf_tables_trans_destroy_work+0x246/0x260 [nf_tables]
  [   71.342916]  process_one_work+0x1d5/0x3c0
  [   71.343601]  worker_thread+0x4a/0x3c0
  [   71.344229]  kthread+0xfb/0x130
  [   71.344780]  ? process_one_work+0x3c0/0x3c0
  [   71.345477]  ? kthread_park+0x90/0x90
  [   71.346129]  ret_from_fork+0x35/0x40
  [   71.346748] Modules linked in: nf_tables_set nf_tables nfnetlink 8021q [last unloaded: nfnetlink]
  [   71.348153] ---[ end trace 2eaa8149ca759bcc ]---
  [   71.349066] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables]
  [   71.350016] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b
  [   71.350017] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282
  [   71.350019] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0
  [   71.350019] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a
  [   71.350020] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000
  [   71.350021] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2
  [   71.350022] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0
  [   71.350025] FS:  0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
  [   71.350026] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [   71.350027] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0
  [   71.350028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [   71.350028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [   71.350030] Kernel panic - not syncing: Fatal exception
  [   71.350412] Kernel Offset: disabled
  [   71.365922] ---[ end Kernel panic - not syncing: Fatal exception ]---

which is caused by dangling elements that have been deactivated, but
never removed.

On a flush operation, nft_pipapo_walk() walks through all the elements
in the mapping table, which are then deactivated by nft_flush_set(),
one by one, and added to the commit list for removal. Element data is
then freed.

On transaction commit, nft_pipapo_remove() is called, and failed to
remove these elements, leading to the stale references in the mapping.
The first symptom of this, revealed by KASan, is a one-byte
use-after-free in subsequent calls to nft_pipapo_walk(), which is
usually not enough to trigger a panic. When stale elements are used
more heavily, though, such as double-free via nft_pipapo_destroy()
as in Phil's case, the problem becomes more noticeable.

The issue comes from that fact that, on a flush operation,
nft_pipapo_remove() won't get the actual key data via elem->key,
elements to be deleted upon commit won't be found by the lookup via
pipapo_get(), and removal will be skipped. Key data should be fetched
via nft_set_ext_key(), instead.

Reported-by: Phil Sutter <phil@nwl.cc>
Fixes: 3c4287f620 ("nf_tables: Add set type for arbitrary concatenation of ranges")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26 14:33:09 +01:00
Pablo Neira Ayuso 9ea4894ba4 Merge branch 'master' of git://blackhole.kfki.hu/nf
Jozsef Kadlecsik says:

====================
ipset patches for nf

The first one is larger than usual, but the issue could not be solved simpler.
Also, it's a resend of the patch I submitted a few days ago, with a one line
fix on top of that: the size of the comment extensions was not taken into
account at reporting the full size of the set.

- Fix "INFO: rcu detected stall in hash_xxx" reports of syzbot
  by introducing region locking and using workqueue instead of timer based
  gc of timed out entries in hash types of sets in ipset.
- Fix the forceadd evaluation path - the bug was also uncovered by the syzbot.
====================

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-02-26 13:55:15 +01:00
Saeed Mahameed 586ee9e8a3 net/mlx5: sparse: warning: Using plain integer as NULL pointer
Return NULL instead of 0.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
2020-02-25 17:06:21 -08:00
Saeed Mahameed 5edc4c7275 net/mlx5: sparse: warning: incorrect type in assignment
drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c:191:13:
sparse: warning: incorrect type in assignment (different base types)

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
2020-02-25 17:06:19 -08:00
Nathan Chancellor fa2b491287 net/mlx5: Fix header guard in rsc_dump.h
Clang warns:

 In file included from
 ../drivers/net/ethernet/mellanox/mlx5/core/main.c:73:
 ../drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h:4:9: warning:
 '__MLX5_RSC_DUMP_H' is used as a header guard here, followed by #define
 of a different macro [-Wheader-guard]
 #ifndef __MLX5_RSC_DUMP_H
         ^~~~~~~~~~~~~~~~~
 ../drivers/net/ethernet/mellanox/mlx5/core/diag/rsc_dump.h:5:9: note:
 '__MLX5_RSC_DUMP__H' is defined here; did you mean '__MLX5_RSC_DUMP_H'?
 #define __MLX5_RSC_DUMP__H
         ^~~~~~~~~~~~~~~~~~
         __MLX5_RSC_DUMP_H
 1 warning generated.

Make them match to get the intended behavior and remove the warning.

Fixes: 12206b1723 ("net/mlx5: Add support for resource dump")
Link: https://github.com/ClangBuiltLinux/linux/issues/897
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-02-25 17:06:16 -08:00