![]() Magnus Karlsson says:
====================
This patch set cleans up the ring access functions of AF_XDP in hope
that it will now be easier to understand and maintain. I used to get a
headache every time I looked at this code in order to really understand it,
but now I do think it is a lot less painful.
The code has been simplified a lot and as a bonus we get better
performance in nearly all cases. On my new 2.1 GHz Cascade Lake
machine with a standard default config plus AF_XDP support and
CONFIG_PREEMPT on I get the following results in percent performance
increases with this patch set compared to without it:
Zero-copy (-N):
rxdrop txpush l2fwd
1 core: -2% 0% 3%
2 cores: 4% 0% 3%
Zero-copy with poll() (-N -p):
rxdrop txpush l2fwd
1 core: 3% 0% 1%
2 cores: 21% 0% 9%
Skb mode (-S):
Shows a 0% to 5% performance improvement over the same benchmarks as
above.
Here 1 core means that we are running the driver processing and the
application on the same core, while 2 cores means that they execute on
separate cores. The applications are from the xdpsock sample app.
On my older 2.0 Ghz Broadwell machine that I used for the v1, I get
the following results:
Zero-copy (-N):
rxdrop txpush l2fwd
1 core: 4% 5% 4%
2 cores: 1% 0% 2%
Zero-copy with poll() (-N -p):
rxdrop txpush l2fwd
1 core: 1% 3% 3%
2 cores: 22% 0% 5%
Skb mode (-S):
Shows a 0% to 1% performance improvement over the same benchmarks as
above.
When a results says 21 or 22% better, as in the case of poll mode with
2 cores and rxdrop, my first reaction is that it must be a
bug. Everything else shows between 0% and 5% performance
improvement. What is giving rise to 22%? A quick bisect indicates that
it is patches 2, 3, 4, 5, and 6 that are giving rise to most of this
improvement. So not one patch in particular, but something around 4%
improvement from each one of them. Note that exactly this benchmark
has previously had an extraordinary slow down compared to when running
without poll syscalls. For all the other poll tests above, the
slowdown has always been around 4% for using poll syscalls. But with
the bad performing test in question, it was above 25%. Interestingly,
after this clean up, the slow down is 4%, just like all the other poll
tests. Please take an extra peek at this so I have not messed up
something.
The 0% for several txpush results are due to the test bottlenecking on
a non-CPU HW resource. If I eliminated that bottleneck on my system, I
would expect to see an increase there too.
Changes v1 -> v2:
* Corrected textual errors in the commit logs (Sergei and Martin)
* Fixed the functions that detect empty and full rings so that they
now operate on the global ring state (Maxim)
This patch has been applied against commit
|
||
---|---|---|
Documentation | ||
LICENSES | ||
arch | ||
block | ||
certs | ||
crypto | ||
drivers | ||
fs | ||
include | ||
init | ||
ipc | ||
kernel | ||
lib | ||
mm | ||
net | ||
samples | ||
scripts | ||
security | ||
sound | ||
tools | ||
usr | ||
virt | ||
.clang-format | ||
.cocciconfig | ||
.get_maintainer.ignore | ||
.gitattributes | ||
.gitignore | ||
.mailmap | ||
COPYING | ||
CREDITS | ||
Kbuild | ||
Kconfig | ||
MAINTAINERS | ||
Makefile | ||
README |
README
Linux kernel ============ There are several guides for kernel developers and users. These guides can be rendered in a number of formats, like HTML and PDF. Please read Documentation/admin-guide/README.rst first. In order to build the documentation, use ``make htmldocs`` or ``make pdfdocs``. The formatted documentation can also be read online at: https://www.kernel.org/doc/html/latest/ There are various text files in the Documentation/ subdirectory, several of them using the Restructured Text markup notation. Please read the Documentation/process/changes.rst file, as it contains the requirements for building and running the kernel, and information about the problems which may result by upgrading your kernel.