linux_old1/Documentation/bpf/prog_cgroup_sysctl.rst

.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)

===========================
BPF_PROG_TYPE_CGROUP_SYSCTL
===========================

This document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that
provides cgroup-bpf hook for sysctl.

The hook has to be attached to a cgroup and will be called every time a
process inside that cgroup tries to read from or write to sysctl knob in proc.

1. Attach type
**************

``BPF_CGROUP_SYSCTL`` attach type has to be used to attach
``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup.

2. Context
**********

``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from
BPF program::

    struct bpf_sysctl {
        __u32 write;
        __u32 file_pos;
    };

* ``write`` indicates whether sysctl value is being read (``0``) or written
  (``1``). This field is read-only.

* ``file_pos`` indicates file position sysctl is being accessed at, read
  or written. This field is read-write. Writing to the field sets the starting
  position in sysctl proc file ``read(2)`` will be reading from or ``write(2)``
  will be writing to. Writing zero to the field can be used e.g. to override
  whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even
  when it's called by user space on ``file_pos > 0``. Writing non-zero
  value to the field can be used to access part of sysctl value starting from
  specified ``file_pos``. Not all sysctl support access with ``file_pos !=
  0``, e.g. writes to numeric sysctl entries must always be at file position
  ``0``. See also ``kernel.sysctl_writes_strict`` sysctl.

See `linux/bpf.h`_ for more details on how context field can be accessed.

3. Return code
**************

``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following
return codes:

* ``0`` means "reject access to sysctl";
* ``1`` means "proceed with access".

If program returns ``0`` user space will get ``-1`` from ``read(2)`` or
``write(2)`` and ``errno`` will be set to ``EPERM``.

4. Helpers
**********

Since sysctl knob is represented by a name and a value, sysctl specific BPF
helpers focus on providing access to these properties:

* ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in
  ``/proc/sys`` into provided by BPF program buffer;

* ``bpf_sysctl_get_current_value()`` to get string value currently held by
  sysctl into provided by BPF program buffer. This helper is available on both
  ``read(2)`` from and ``write(2)`` to sysctl;

* ``bpf_sysctl_get_new_value()`` to get new string value currently being
  written to sysctl before actual write happens. This helper can be used only
  on ``ctx->write == 1``;

* ``bpf_sysctl_set_new_value()`` to override new string value currently being
  written to sysctl before actual write happens. Sysctl value will be
  overridden starting from the current ``ctx->file_pos``. If the whole value
  has to be overridden BPF program can set ``file_pos`` to zero before calling
  to the helper. This helper can be used only on ``ctx->write == 1``. New
  string value set by the helper is treated and verified by kernel same way as
  an equivalent string passed by user space.

BPF program sees sysctl value same way as user space does in proc filesystem,
i.e. as a string. Since many sysctl values represent an integer or a vector
of integers, the following helpers can be used to get numeric value from the
string:

* ``bpf_strtol()`` to convert initial part of the string to long integer
  similar to user space `strtol(3)`_;
* ``bpf_strtoul()`` to convert initial part of the string to unsigned long
  integer similar to user space `strtoul(3)`_;

See `linux/bpf.h`_ for more details on helpers described here.

5. Examples
***********

See `test_sysctl_prog.c`_ for an example of BPF program in C that access
sysctl name and value, parses string value to get vector of integers and uses
the result to make decision whether to allow or deny access to sysctl.

6. Notes
********

``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in **trusted** root
environment, for example to monitor sysctl usage or catch unreasonable values
an application, running as root in a separate cgroup, is trying to set.

Since `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it
may return results different from that at `sys_open` time, i.e. process that
opened sysctl file in proc filesystem may differ from process that is trying
to read from / write to it and two such processes may run in different
cgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a
security mechanism to limit sysctl usage.

As with any cgroup-bpf program additional care should be taken if an
application running as root in a cgroup should not be allowed to
detach/replace BPF program attached by administrator.

.. Links
.. _linux/bpf.h: ../../include/uapi/linux/bpf.h
.. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html
.. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html
.. _test_sysctl_prog.c:
   ../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c
bpf: Document BPF_PROG_TYPE_CGROUP_SYSCTL Add documentation for BPF_PROG_TYPE_CGROUP_SYSCTL, including general info, attach type, context, return code, helpers, example and usage considerations. A separate file prog_cgroup_sysctl.rst is added to Documentation/bpf/. In the future more program types can be documented in their own prog_<name>.rst files. Another way to place program type specific documentation would be to group program types somehow (e.g. cgroup.rst for all cgroup-bpf programs), but it may not scale well since some program types may belong to different groups, e.g. BPF_PROG_TYPE_CGROUP_SKB can be documented together with either cgroup-bpf programs or programs that access skb. The new file is added to the index and verified by `make htmldocs` / sanity-check by lynx. Signed-off-by: Andrey Ignatov <rdna@fb.com> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> 2019-04-18 13:28:57 +08:00			`.. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)`

			`===========================`
			`BPF_PROG_TYPE_CGROUP_SYSCTL`
			`===========================`

			This document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that
			`provides cgroup-bpf hook for sysctl.`

			`The hook has to be attached to a cgroup and will be called every time a`
			`process inside that cgroup tries to read from or write to sysctl knob in proc.`

			`1. Attach type`
			`**************`

			``BPF_CGROUP_SYSCTL`` attach type has to be used to attach
			``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup.

			`2. Context`
			`**********`

			``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from
			`BPF program::`

			`struct bpf_sysctl {`
			`__u32 write;`
			`__u32 file_pos;`
			`};`

			* ``write`` indicates whether sysctl value is being read (``0``) or written
			(``1``). This field is read-only.

			* ``file_pos`` indicates file position sysctl is being accessed at, read
			`or written. This field is read-write. Writing to the field sets the starting`
			position in sysctl proc file ``read(2)`` will be reading from or ``write(2)``
			`will be writing to. Writing zero to the field can be used e.g. to override`
			whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even
			when it's called by user space on ``file_pos > 0``. Writing non-zero
			`value to the field can be used to access part of sysctl value starting from`
			specified ``file_pos``. Not all sysctl support access with ``file_pos !=
			0``, e.g. writes to numeric sysctl entries must always be at file position
			``0``. See also ``kernel.sysctl_writes_strict`` sysctl.

			See `linux/bpf.h`_ for more details on how context field can be accessed.

			`3. Return code`
			`**************`

			``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following
			`return codes:`

			* ``0`` means "reject access to sysctl";
			* ``1`` means "proceed with access".

			If program returns ``0`` user space will get ``-1`` from ``read(2)`` or
			``write(2)`` and ``errno`` will be set to ``EPERM``.

			`4. Helpers`
			`**********`

			`Since sysctl knob is represented by a name and a value, sysctl specific BPF`
			`helpers focus on providing access to these properties:`

			* ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in
			``/proc/sys`` into provided by BPF program buffer;

			* ``bpf_sysctl_get_current_value()`` to get string value currently held by
			`sysctl into provided by BPF program buffer. This helper is available on both`
			``read(2)`` from and ``write(2)`` to sysctl;

			* ``bpf_sysctl_get_new_value()`` to get new string value currently being
			`written to sysctl before actual write happens. This helper can be used only`
			on ``ctx->write == 1``;

			* ``bpf_sysctl_set_new_value()`` to override new string value currently being
			`written to sysctl before actual write happens. Sysctl value will be`
			overridden starting from the current ``ctx->file_pos``. If the whole value
			has to be overridden BPF program can set ``file_pos`` to zero before calling
			to the helper. This helper can be used only on ``ctx->write == 1``. New
			`string value set by the helper is treated and verified by kernel same way as`
			`an equivalent string passed by user space.`

			`BPF program sees sysctl value same way as user space does in proc filesystem,`
			`i.e. as a string. Since many sysctl values represent an integer or a vector`
			`of integers, the following helpers can be used to get numeric value from the`
			`string:`

			* ``bpf_strtol()`` to convert initial part of the string to long integer
			similar to user space `strtol(3)`_;
			* ``bpf_strtoul()`` to convert initial part of the string to unsigned long
			integer similar to user space `strtoul(3)`_;

			See `linux/bpf.h`_ for more details on helpers described here.

			`5. Examples`
			`***********`

			See `test_sysctl_prog.c`_ for an example of BPF program in C that access
			`sysctl name and value, parses string value to get vector of integers and uses`
			`the result to make decision whether to allow or deny access to sysctl.`

			`6. Notes`
			`********`

			``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in trusted root
			`environment, for example to monitor sysctl usage or catch unreasonable values`
			`an application, running as root in a separate cgroup, is trying to set.`

			Since `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it
			may return results different from that at `sys_open` time, i.e. process that
			`opened sysctl file in proc filesystem may differ from process that is trying`
			`to read from / write to it and two such processes may run in different`
			cgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a
			`security mechanism to limit sysctl usage.`

			`As with any cgroup-bpf program additional care should be taken if an`
			`application running as root in a cgroup should not be allowed to`
			`detach/replace BPF program attached by administrator.`

			`.. Links`
			`.. _linux/bpf.h: ../../include/uapi/linux/bpf.h`
			`.. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html`
			`.. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html`
			`.. _test_sysctl_prog.c:`
			`../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c`