bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4
After doing map_perf_test with a much bigger BPF_F_NO_COMMON_LRU map, the perf report shows a lot of time spent in rotating the inactive list (i.e. __bpf_lru_list_rotate_inactive): > map_perf_test 32 8 10000 1000000 | awk '{sum += $3}END{print sum}' 19644783 (19M/s) > map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}' 6283930 (6.28M/s) By inactive, it usually means the element is not in cache. Hence, there is a need to tune the PERCPU_NR_SCANS value. This patch finds a better number of elements to scan during each list rotation. The PERCPU_NR_SCANS (which is defined the same as PERCPU_FREE_TARGET) decreases from 16 elements to 4 elements. This change only affects the BPF_F_NO_COMMON_LRU map. The test_lru_dist does not show meaningful difference between 16 and 4. Our production L4 load balancer which uses the LRU map for conntrack-ing also shows little change in cache hit rate. Since both benchmark and production data show no cache-hit difference, PERCPU_NR_SCANS is lowered from 16 to 4. We can consider making it configurable if we find a usecase later that shows another value works better and/or use a different rotation strategy. After this change: > map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}' 9240324 (9.2M/s) i.e. 6.28M/s -> 9.2M/s The test_lru_dist has not shown meaningful difference: > test_lru_dist zipf.100k.a1_01.out 4000 1: nr_misses: 31575 (Before) vs 31566 (After) > test_lru_dist zipf.100k.a0_01.out 40000 1 nr_misses: 67036 (Before) vs 67031 (After) Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
This commit is contained in:
parent
9fd63d05f3
commit
695ba2651a
|
@ -13,7 +13,7 @@
|
||||||
#define LOCAL_FREE_TARGET (128)
|
#define LOCAL_FREE_TARGET (128)
|
||||||
#define LOCAL_NR_SCANS LOCAL_FREE_TARGET
|
#define LOCAL_NR_SCANS LOCAL_FREE_TARGET
|
||||||
|
|
||||||
#define PERCPU_FREE_TARGET (16)
|
#define PERCPU_FREE_TARGET (4)
|
||||||
#define PERCPU_NR_SCANS PERCPU_FREE_TARGET
|
#define PERCPU_NR_SCANS PERCPU_FREE_TARGET
|
||||||
|
|
||||||
/* Helpers to get the local list index */
|
/* Helpers to get the local list index */
|
||||||
|
|
|
@ -22,7 +22,7 @@
|
||||||
#include "bpf_util.h"
|
#include "bpf_util.h"
|
||||||
|
|
||||||
#define LOCAL_FREE_TARGET (128)
|
#define LOCAL_FREE_TARGET (128)
|
||||||
#define PERCPU_FREE_TARGET (16)
|
#define PERCPU_FREE_TARGET (4)
|
||||||
|
|
||||||
static int nr_cpus;
|
static int nr_cpus;
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue