Go to file
Alexei Starovoitov 20a6d91518 Merge branch 'sockmap/sk_skb program memory acct fixes'
John Fastabend says:

====================

Users of sockmap and skmsg trying to build proxys and other tools
have pointed out to me the error handling can be problematic. If
the proxy is under-provisioned and/or the BPF admin does not have
the ability to update/modify memory provisions on the sockets
its possible data may be dropped. For some things we have retries
so everything works out OK, but for most things this is likely
not great. And things go bad.

The original design dropped memory accounting on the receive
socket as early as possible. We did this early in sk_skb
handling and then charged it to the redirect socket immediately
after running the BPF program.

But, this design caused a fundamental problem. Namely, what should we do
if we redirect to a socket that has already reached its socket memory
limits. For proxy use cases the network admin can tune memory limits.
But, in general we punted on this problem and told folks to simply make
your memory limits high enough to handle your workload. This is not a
really good answer. When deploying into environments where we expect this
to be transparent its no longer the case because we need to tune params.
In fact its really only viable in cases where we have fine grained
control over the application. For example a proxy redirecting from an
ingress socket to an egress socket. The result is I get bug
reports because its surprising for one, but more importantly also breaks
some use cases. So lets fix it.

This series cleans up the different cases so that in many common
modes, such as passing packet up to receive socket, we can simply
use the underlying assumption that the TCP stack already has done
memory accounting.

Next instead of trying to do memory accounting against the socket
we plan to redirect into we keep memory accounting on the receive
socket until the skb can be put on the redirect socket. This means
if we do an egress redirect to a socket and sock_writable() returns
EAGAIN we can requeue the skb on the workqueue and try again. The
same scenario plays out for ingress. If the skb can not be put on
the receive queue of the redirect socket than we simply requeue and
retry. In both cases memory is still accounted for against the
receiving socket.

This also handles head of line blocking. With the above scheme the
skb is on a queue associated with the socket it will be sent/recv'd
on, but the memory accounting is against the received socket. This
means the receive socket can advance to the next skb and avoid head
of line blocking. At least until its receive memory on the socket
runs out. This will put some maximum size on the amount of data any
socket can enqueue giving us bounds on the skb lists so they can't grow
indefinitely.

Overall I think this is a win. Tested with test_sockmap.

These are fixes, but I tagged it for bpf-next considering we are
at -rc8.

v1->v2: Fix uninitialized/unused variables (kernel test robot)
v2->v3: fix typo in patch2 err=0 needs to be <0 so use err=-EIO
---
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-10-11 18:00:57 -07:00
Documentation bpf: Migrate from patchwork.ozlabs.org to patchwork.kernel.org. 2020-10-11 22:05:47 +02:00
LICENSES LICENSES: Rename other to deprecated 2019-05-03 06:34:32 -06:00
arch Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-10-01 14:29:01 -07:00
block - Fix a regression in bdev partition locking (Christoph) 2020-09-11 11:55:28 -07:00
certs .gitignore: add SPDX License Identifier 2020-03-25 11:50:48 +01:00
crypto Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2020-08-30 15:53:44 -07:00
drivers bpf: Add redirect_peer helper 2020-10-11 10:21:04 -07:00
fs Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
include bpf: Allow for map-in-map with dynamic inner array map entries 2020-10-11 10:21:04 -07:00
init Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
ipc ipc: adjust proc_ipc_sem_dointvec definition to match prototype 2020-09-05 12:14:29 -07:00
kernel bpf: Allow for map-in-map with dynamic inner array map entries 2020-10-11 10:21:04 -07:00
lib Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
mm Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
net bpf, sockmap: Add memory accounting so skbs on ingress lists are visible 2020-10-11 18:00:57 -07:00
samples samples: bpf: Refactor XDP kern program maps with BTF-defined map 2020-10-11 12:14:36 -07:00
scripts bpf: Add bpf_snprintf_btf helper 2020-09-28 18:26:58 -07:00
security Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
sound sound fixes for 5.9-rc6 2020-09-18 11:38:08 -07:00
tools bpf, selftests: Add redirect_peer selftest 2020-10-11 10:21:04 -07:00
usr Merge branch 'work.fdpic' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2020-08-07 13:29:39 -07:00
virt KVM: fix memory leak in kvm_io_bus_unregister_dev() 2020-09-11 13:15:11 -04:00
.clang-format clang-format: Update with the latest for_each macro list 2020-09-01 12:53:42 +02:00
.cocciconfig scripts: add Linux .cocciconfig for coccinelle 2016-07-22 12:13:39 +02:00
.get_maintainer.ignore Opt out of scripts/get_maintainer.pl 2019-05-16 10:53:40 -07:00
.gitattributes .gitattributes: use 'dts' diff driver for dts files 2019-12-04 19:44:11 -08:00
.gitignore .gitignore: Add ZSTD-compressed files 2020-07-31 11:50:49 +02:00
.mailmap mailmap: add older email addresses for Kees Cook 2020-09-19 13:13:38 -07:00
COPYING COPYING: state that all contributions really are covered by this file 2020-02-10 13:32:20 -08:00
CREDITS MAINTAINERS: Add Krzysztof Kozlowski to Samsung S3FWRN5 and remove Robert 2020-09-10 15:22:16 -07:00
Kbuild kbuild: rename hostprogs-y/always to hostprogs/always-y 2020-02-04 01:53:07 +09:00
Kconfig kbuild: ensure full rebuild when the compiler is updated 2020-05-12 13:28:33 +09:00
MAINTAINERS bpf, doc: Update Andrii's email in MAINTAINERS 2020-10-06 00:43:11 +02:00
Makefile Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next 2020-09-23 13:11:11 -07:00
README Drop all 00-INDEX files from Documentation/ 2018-09-09 15:08:58 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.