linux/net/sched
Daniel Borkmann 7d1d65cb84 net: sched: cls_bpf: add BPF-based classifier
This work contains a lightweight BPF-based traffic classifier that can
serve as a flexible alternative to ematch-based tree classification, i.e.
now that BPF filter engine can also be JITed in the kernel. Naturally, tc
actions and policies are supported as well with cls_bpf. Multiple BPF
programs/filter can be attached for a class, or they can just as well be
written within a single BPF program, that's really up to the user how he
wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
The notion of a BPF program's return/exit codes is being kept as follows:

     0: No match
    -1: Select classid given in "tc filter ..." command
  else: flowid, overwrite the default one

As a minimal usage example with iproute2, we use a 3 band prio root qdisc
on a router with sfq each as leave, and assign ssh and icmp bpf-based
filters to band 1, http traffic to band 2 and the rest to band 3. For the
first two bands we load the bytecode from a file, in the 2nd we load it
inline as an example:

echo 1 > /proc/sys/net/core/bpf_jit_enable

tc qdisc del dev em1 root
tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

tc qdisc add dev em1 parent 1:1 sfq perturb 16
tc qdisc add dev em1 parent 1:2 sfq perturb 16
tc qdisc add dev em1 parent 1:3 sfq perturb 16

tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3

BPF programs can be easily created and passed to tc, either as inline
'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
compile opcodes, for example:

1) People familiar with tcpdump-like filters:

   tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf

2) People that want to low-level program their filters or use BPF
   extensions that lack support by libpcap's compiler:

   bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf

   ssh.ops example code:
   ldh [12]
   jne #0x800, drop
   ldb [23]
   jneq #6, drop
   ldh [20]
   jset #0x1fff, drop
   ldxb 4 * ([14] & 0xf)
   ldh [%x + 14]
   jeq #0x16, pass
   ldh [%x + 16]
   jne #0x16, drop
   pass: ret #-1
   drop: ret #0

It was chosen to load bytecode into tc, since the reverse operation,
tc filter list dev em1, is then able to show the exact commands again.
Possible follow-up work could also include a small expression compiler
for iproute2. Tested with the help of bmon. This idea came up during
the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
Eric Dumazet!

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-10-29 17:33:17 -04:00
..
Kconfig net: sched: cls_bpf: add BPF-based classifier 2013-10-29 17:33:17 -04:00
Makefile net: sched: cls_bpf: add BPF-based classifier 2013-10-29 17:33:17 -04:00
act_api.c rtnetlink: Remove passing of attributes into rtnl_doit functions 2013-03-22 10:31:16 -04:00
act_csum.c act_csum: fix possible use after free 2013-04-12 15:25:41 -04:00
act_gact.c pkt_sched: namespace aware act_mirred 2013-01-14 15:09:36 -05:00
act_ipt.c net_sched: act_ipt forward compat with xtables 2013-05-01 13:19:19 -04:00
act_mirred.c net: pass info struct via netdevice notifier 2013-05-28 13:11:01 -07:00
act_nat.c pkt_sched: namespace aware act_mirred 2013-01-14 15:09:36 -05:00
act_pedit.c net: Add skb_unclone() helper function. 2013-02-15 15:10:37 -05:00
act_police.c net_sched: add u64 rate to psched_ratecfg_precompute() 2013-09-20 14:41:02 -04:00
act_simple.c pkt_sched: namespace aware act_mirred 2013-01-14 15:09:36 -05:00
act_skbedit.c pkt_sched: namespace aware act_mirred 2013-01-14 15:09:36 -05:00
cls_api.c net_cls: remove duplicated include from cls_api.c 2013-04-07 17:12:01 -04:00
cls_basic.c qdisc: basic classifier - remove unnecessary initialization 2013-09-30 15:47:43 -04:00
cls_bpf.c net: sched: cls_bpf: add BPF-based classifier 2013-10-29 17:33:17 -04:00
cls_cgroup.c cgroup: cls: remove unnecessary task_cls_classid 2013-10-08 16:27:34 -04:00
cls_flow.c netlink: rename ssk to sk in struct netlink_skb_params 2013-04-19 14:57:56 -04:00
cls_fw.c pkt_sched: fix error return code in fw_change_attrs() 2013-04-19 17:34:53 -04:00
cls_route.c pkt_sched: namespace aware act_mirred 2013-01-14 15:09:36 -05:00
cls_rsvp.c
cls_rsvp.h pkt_sched: namespace aware act_mirred 2013-01-14 15:09:36 -05:00
cls_rsvp6.c
cls_tcindex.c pkt_sched: namespace aware act_mirred 2013-01-14 15:09:36 -05:00
cls_u32.c pkt_sched: namespace aware act_mirred 2013-01-14 15:09:36 -05:00
em_canid.c net: em_canid: Ematch rule to match CAN frames according to their identifiers 2012-07-04 13:07:05 +02:00
em_cmp.c net_sched: cleanups 2011-01-19 23:31:12 -08:00
em_ipset.c em_ipset: use dev_net() accessor 2013-10-18 16:23:06 -04:00
em_meta.c qdisc: meta return ENOMEM on alloc failure 2013-09-30 15:47:43 -04:00
em_nbyte.c net_sched: cleanups 2011-01-19 23:31:12 -08:00
em_text.c net_sched: cleanups 2011-01-19 23:31:12 -08:00
em_u32.c net_sched: cleanups 2011-01-19 23:31:12 -08:00
ematch.c net: Convert net_ratelimit uses to net_<level>_ratelimited 2012-05-15 13:45:03 -04:00
sch_api.c net_sched: increment drop counters in qdisc_tree_decrease_qlen() 2013-10-08 16:27:33 -04:00
sch_atm.c net_sched: info leak in atm_tc_dump_class() 2013-07-31 15:04:19 -07:00
sch_blackhole.c
sch_cbq.c net_sched: Fix stack info leak in cbq_dump_wrr(). 2013-07-30 00:16:21 -07:00
sch_choke.c treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks 2013-08-20 13:06:40 +02:00
sch_codel.c fq_codel: should use qdisc backlog as threshold 2012-05-16 15:30:26 -04:00
sch_drr.c net_sched: add 64bit rate estimators 2013-06-11 02:51:03 -07:00
sch_dsmark.c net: sched: factorize code (qdisc_drop()) 2012-05-04 11:50:05 -04:00
sch_fifo.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
sch_fq.c pkt_sched: fq: fix non TCP flows pacing 2013-10-08 21:54:01 -04:00
sch_fq_codel.c net: fq_codel: Fix off-by-one error 2013-03-29 15:32:23 -04:00
sch_generic.c net: Separate the close_list and the unreg_list v2 2013-10-07 15:23:14 -04:00
sch_gred.c net_sched: gred: actually perform idling in WRED mode 2012-09-13 16:10:13 -04:00
sch_hfsc.c net_sched: add 64bit rate estimators 2013-06-11 02:51:03 -07:00
sch_htb.c net_sched: htb: support of 64bit rates 2013-09-20 14:41:03 -04:00
sch_ingress.c net_sched: factorize qdisc stats handling 2011-01-10 16:07:54 -08:00
sch_mq.c qdisc: allow setting default queuing discipline 2013-08-31 00:32:32 -04:00
sch_mqprio.c qdisc: allow setting default queuing discipline 2013-08-31 00:32:32 -04:00
sch_multiq.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
sch_netem.c netem: markov loss model transition fix 2013-10-25 19:03:39 -04:00
sch_plug.c net_sched: sch_plug: plug_qdisc_ops is static 2012-02-13 16:04:40 -05:00
sch_prio.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
sch_qfq.c pkt_sched: sch_qfq: remove a source of high packet delay/jitter 2013-07-18 13:02:00 -07:00
sch_red.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
sch_sfb.c sch_sfb: Fix missing NULL check 2012-07-12 08:33:18 -07:00
sch_sfq.c pkt_sched: Stop using NLA_PUT*(). 2012-04-01 18:11:37 -04:00
sch_tbf.c net_sched: add u64 rate to psched_ratecfg_precompute() 2013-09-20 14:41:02 -04:00
sch_teql.c sch_teql: Convert over to dev_neigh_lookup_skb(). 2012-07-05 01:09:06 -07:00