linux/kernel/cgroup
Andrey Ignatov 7dd68b3279 bpf: Support replacing cgroup-bpf program in MULTI mode
The common use-case in production is to have multiple cgroup-bpf
programs per attach type that cover multiple use-cases. Such programs
are attached with BPF_F_ALLOW_MULTI and can be maintained by different
people.

Order of programs usually matters, for example imagine two egress
programs: the first one drops packets and the second one counts packets.
If they're swapped the result of counting program will be different.

It brings operational challenges with updating cgroup-bpf program(s)
attached with BPF_F_ALLOW_MULTI since there is no way to replace a
program:

* One way to update is to detach all programs first and then attach the
  new version(s) again in the right order. This introduces an
  interruption in the work a program is doing and may not be acceptable
  (e.g. if it's egress firewall);

* Another way is attach the new version of a program first and only then
  detach the old version. This introduces the time interval when two
  versions of same program are working, what may not be acceptable if a
  program is not idempotent. It also imposes additional burden on
  program developers to make sure that two versions of their program can
  co-exist.

Solve the problem by introducing a "replace" mode in BPF_PROG_ATTACH
command for cgroup-bpf programs being attached with BPF_F_ALLOW_MULTI
flag. This mode is enabled by newly introduced BPF_F_REPLACE attach flag
and bpf_attr.replace_bpf_fd attribute to pass fd of the old program to
replace

That way user can replace any program among those attached with
BPF_F_ALLOW_MULTI flag without the problems described above.

Details of the new API:

* If BPF_F_REPLACE is set but replace_bpf_fd doesn't have valid
  descriptor of BPF program, BPF_PROG_ATTACH will return corresponding
  error (EINVAL or EBADF).

* If replace_bpf_fd has valid descriptor of BPF program but such a
  program is not attached to specified cgroup, BPF_PROG_ATTACH will
  return ENOENT.

BPF_F_REPLACE is introduced to make the user intent clear, since
replace_bpf_fd alone can't be used for this (its default value, 0, is a
valid fd). BPF_F_REPLACE also makes it possible to extend the API in the
future (e.g. add BPF_F_BEFORE and BPF_F_AFTER if needed).

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Andrii Narkyiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/30cd850044a0057bdfcaaf154b7d2f39850ba813.1576741281.git.rdna@fb.com
2019-12-19 21:22:25 -08:00
..
Makefile cgroup: cgroup v2 freezer 2019-04-19 11:26:48 -07:00
cgroup-internal.h cgroup: Optimize single thread migration 2019-10-07 07:11:53 -07:00
cgroup-v1.c cgroup: Optimize single thread migration 2019-10-07 07:11:53 -07:00
cgroup.c bpf: Support replacing cgroup-bpf program in MULTI mode 2019-12-19 21:22:25 -08:00
cpuset.c Merge branch 'for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2019-11-25 19:23:46 -08:00
debug.c kernel: cgroup: fix misuse of %x 2019-05-06 08:47:48 -07:00
freezer.c cgroup: freezer: don't change task and cgroups status unnecessarily 2019-11-07 07:38:41 -08:00
legacy_freezer.c cgroup: rename freezer.c into legacy_freezer.c 2019-04-19 11:26:48 -07:00
namespace.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
pids.c cgroup: pids: use atomic64_t for pids->limit 2019-10-24 12:07:10 -07:00
rdma.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 451 2019-06-19 17:09:08 +02:00
rstat.c cgroup: use cgroup->last_bstat instead of cgroup->bstat_pending for consistency 2019-11-06 12:50:15 -08:00