Page MenuHomeFreeBSD

New pfil(9)
ClosedPublic

Authored by glebius on Jan 25 2019, 1:13 AM.
Tags
None
Referenced Files
Unknown Object (File)
Fri, Apr 12, 2:29 PM
Unknown Object (File)
Fri, Apr 12, 11:02 AM
Unknown Object (File)
Fri, Mar 22, 9:39 PM
Unknown Object (File)
Fri, Mar 22, 9:39 PM
Unknown Object (File)
Fri, Mar 22, 9:39 PM
Unknown Object (File)
Fri, Mar 22, 9:39 PM
Unknown Object (File)
Fri, Mar 22, 9:39 PM
Unknown Object (File)
Fri, Mar 22, 9:39 PM

Details

Summary

o Make it possible to reconfigure pfil(9) configuration with a tool.

This allows to change order of hooks, rehook filter from one filtering
point to a different one. Disconnect/connect hook on input/output only.
Prepend/append a hook. Whatever you imagine.

o Make it possible for a single packet filter to provide multiple rulesets

that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface.

o Make it possible to create pfil heads, that provide not an mbuf pointer

but memory pointer with length. As example create such in Mellanox
driver. Note: Mellanox driver isn't going to be touched on first commit.
Code provided in review as example only! (for now)

o Sync pfil hooks epoch(9)

All together this allows for lots of possibilities assuming packet filters
and drivers are also improved.

For now, this patch allows to do the following:

sysctl net.link.ether.ipfw=1
./pfilctl unlink -io ipfw:default inet
./pfilctl unlink -io ipfw:default6 inet6
./pfilctl unlink -io ipfw:default-link ethernet
./pfilctl link -i ipfw:default-link mce0

This effectively moves default ipfw ruleset to input of mce0. Packets
to be filtered at the most early point. Without modifications to ipfw
this change allows to improve packet drop rate nearly twice. And even
more with a small patch that learns ipfw on how to process void * pointer
instead of mbuf.

Branch with history: https://github.com/glebius/FreeBSD/tree/pfil

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
Lint Not Applicable
Unit
Tests Not Applicable

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

This panics during the netipsec and pf regression tests:

panic: vnet_pfil_init: failed to create dev: 17
cpuid = 5
time = 1548381169
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00a5a99480
vpanic() at vpanic+0x1b4/frame 0xfffffe00a5a994e0
panic() at panic+0x43/frame 0xfffffe00a5a99540
vnet_pfil_init() at vnet_pfil_init+0xa9/frame 0xfffffe00a5a995a0
vnet_alloc() at vnet_alloc+0x144/frame 0xfffffe00a5a995d0
kern_jail_set() at kern_jail_set+0x1b32/frame 0xfffffe00a5a99860
sys_jail_set() at sys_jail_set+0x40/frame 0xfffffe00a5a99890
amd64_syscall() at amd64_syscall+0x276/frame 0xfffffe00a5a999b0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe00a5a999b0

  • syscall (507, FreeBSD ELF64, sys_jail_set), rip = 0x80031dbca, rsp = 0x7fffffffe4b8, rbp = 0x7fffffffe5a0 ---
bz requested changes to this revision.Jan 25 2019, 7:16 AM
bz added a subscriber: bz.

Can we also get rid of the special case pfil_ipfw with this (before/after this change)?
Also, while I like it can we break out certain changes which do not have to be part of the initial framework change, e.g., the Mellanox driver change seems to be self-contained (as others possibly are). Helps to review and understand individual parts.

sys/contrib/ipfilter/netinet/ip_fil_freebsd.c
153 ↗(On Diff #53181)

struct ip? Should this be struct ip6_hdr as well?

1331 ↗(On Diff #53181)

Can we please hide all legacy IP stuff under some equivalent INET ifdef? (here and everywhere else in the diff)

sys/net/pfil.c
165 ↗(On Diff #53181)

Can't this be called pfil_memptr() rather than "fake_mbuf"?

This revision now requires changes to proceed.Jan 25 2019, 7:16 AM
0mp requested changes to this revision.Jan 25 2019, 11:54 AM
0mp added a subscriber: 0mp.

You may also consider running igor and mandoc -Tlint to lint the manual page.

sbin/pfilctl/pfilctl.8
37 ↗(On Diff #53181)

I believe it would be better to actually include the full synopsis here instead just .Ar command.

45 ↗(On Diff #53181)

Macros do not expand in -width "". It's better to just use -width "unlink".

46 ↗(On Diff #53181)

Nm is not required here.

48 ↗(On Diff #53181)

Nm is not required here.

50 ↗(On Diff #53181)

How about:

pfilctl unlink -i  hook head
pfilctl unlink -o  hook head
                 Link hook hook to head.  With -i flag the hook will be
                 connected as input and with -o as output hook.  At least one
...
51 ↗(On Diff #53181)

Nm is not required here.

58 ↗(On Diff #53181)

Link hook hook? Would it be better to write it like this instead:

Link
.Ar hook
to
61 ↗(On Diff #53181)

With the

70 ↗(On Diff #53181)

How about

flags must be specified?

I am not a native speaker myself, so those are just suggestions :)

76 ↗(On Diff #53181)

Adding the

79 ↗(On Diff #53181)

See the comment about pfilctl unlink -i hook head.

96 ↗(On Diff #53181)

This list should be sorted by sections and then alphabetically.

sys/contrib/ipfilter/netinet/ip_fil_freebsd.c
153 ↗(On Diff #53181)

Well, this is how it was in ipfilter before. Of course this looks very much like a bug, but fixing ipfilter out of scope of this commit.

1331 ↗(On Diff #53181)

Again, this is how ipfilter does now. I'm all for improving it, but out of scope of this work.

sys/net/pfil.c
165 ↗(On Diff #53181)

The function converts from a memptr to mbuf, so it creates a fake a mbuf, so that's why the name. Naming it pfil_memptr, IMHO, will be misleading.

glebius added inline comments.
sbin/pfilctl/pfilctl.8
50 ↗(On Diff #53181)

I'd prefer to keep single synopsis line if it is possible. Is my syntax legit?

70 ↗(On Diff #53181)

No idea which is better :)

This panics during the netipsec and pf regression tests:

Fixed, thanks!

  • Create /dev/pfil only for the default VNET.
  • Corrections from Mateusz.
  • mandoc -T lint
  • igor
rgrimes added inline comments.
sbin/pfilctl/pfilctl.8
70 ↗(On Diff #53181)

The proper form would be:
At least one of
.Fl i
or
.Fl o
is required.

The and is the wrong joiner, and "flags" is not needed. Could also be written as
Either
.Fl i
or
.Fl o
must be present.

  • Improve wording. [1]
  • Fix two .Nd present. Since mdoc(7) says that .Nd doesn't accept

Did you happen to do any benchmarking on this? I'd have expected "Sync pfil hooks epoch(9)" to give us a (small) performance improvement, but my initial test shows a small reduction in forwarding performance (with pf loaded).

In D18951#406615, @kristof wrote:

Did you happen to do any benchmarking on this? I'd have expected "Sync pfil hooks epoch(9)" to give us a (small) performance improvement, but my initial test shows a small reduction in forwarding performance (with pf loaded).

Of course we did :) Performance is what motivated the changes. However, this isn't yet final state of things.

Entering/exiting epoch for every packet in pfil indeed is not faster than an rmlock, that was there before. However, the future plan is that epoch will be entered earlier in packet processing path, so pfil would be guaranteed to be in network epoch. In this case epoch_enter/epoch_exit is removed from pfil_run_hooks.

In our internal branch we enter epoch once per RX interrupt, process a batch of packets and exit. This also is planned to be upstreamed to FreeBSD, but later.

sys/net/pfil.c
71 ↗(On Diff #53388)

I think, you planned to use PFIL_EPOCH macro here in epoch_enter_preempt() and epoch_exit_preempt() :)

sys/net/pfil.h
53 ↗(On Diff #53388)

maybe it would be better to use MAXMODNAME from <sys/module.h>?

54 ↗(On Diff #53388)

Maybe we can reserve a bit more bytes for this name to be able store some meaning descriptions here?
Later it will be not so easy to extend the size.

81 ↗(On Diff #53388)

s/ /\t/

103 ↗(On Diff #53388)

It seems there is not any difference between DROPPED and CONSUMED. Also, note that the code that calls pfil_run_hooks() expects that the return value is an error from errno(2), where 1 corresponds to EPERM, but another values can be passed to userlevel and ENOENT=2, ESRCH=3 can confuse. If you want to change this protocol, you need to inspect the code around the pfil_run_hook().

sys/net/pfil.c
231 ↗(On Diff #53388)

I could be wrong, but it seems it is unsafe to just free() memory in pfil_head_unregister(). What if in the same time some packet will be handled by pfil_run_hooks()?

I think each interface can register own pfil head and on detach/destroy interface should unregister it (it looks like you missed this part in mlx5en). It looks quite possible, that on high pps trough interface pfil_head_unregister() can be invoked in the same time that pfil_run_hooks(). And pfil_run_hooks() can already get link pointer, so when you do free() to this pointer, dereferencing link in the pfil_run_hooks() will lead to page fault.

glebius added inline comments.
sys/net/pfil.c
71 ↗(On Diff #53388)

Doesn't really matter, since future plan is to assert net_epoch in pfil, not enter it.

231 ↗(On Diff #53388)

No, it is responsibility of a subsystem to guarantee that pfil_head_unregister is called only when packet flow has completely stopped. This was the case before my changes and this remains as is.

P.S. Yes, mlx5en patch lacks unregistering.

sys/net/pfil.h
53 ↗(On Diff #53388)

Would require including sys/module.h into userland utility to manipulate pfil(9), which is ugly, since purpose of sys/module.h is completely different.

I created MAXPFILNAME equal to 64. This is longest name of a pf queue. In ipfw we don't have long string constants yet. 64 should be enough for anyone :)

We can't go too big, e.g. MAXPATHLEN, because pfilioc_link is used internally in the kernel.

103 ↗(On Diff #53388)

Will double check that.

glebius marked 2 inline comments as done.
  • Set max names length to MAXPFILNAME equal to 64.
  • Go through all pfil_run_hooks() and properly compare against PFIL_PASS
sys/contrib/ipfilter/netinet/ip_fil_freebsd.c
153 ↗(On Diff #53181)

We can't change ip to ip6_hdr, at least not yet. ipf_check defines the second argument as ip. Later in ipf_check it casts ip to ip6_t if the version is 6. It's another thing I need to sift through.

sys/netpfil/ipfw/ip_fw_pfil.c
147 ↗(On Diff #53449)

This looks like unintended change. dir variable keeps internal ipfw's value here, and it has different value than PFIL_IN/OUT.

163 ↗(On Diff #53449)

can be removed.

204 ↗(On Diff #53449)

DIR_IN != PFIL_IN

258 ↗(On Diff #53449)

This line is wrongly placed, the code is unreachable.

298 ↗(On Diff #53449)

redundant line.

314 ↗(On Diff #53449)

can be removed.

418 ↗(On Diff #53449)

return (PFILL_CONSUMED);

430 ↗(On Diff #53449)

ret = PFIL_CONSUMED;

436 ↗(On Diff #53449)

ret != PFIL_PASS

glebius added inline comments.
sys/netpfil/ipfw/ip_fw_pfil.c
147 ↗(On Diff #53449)

Thanks a lot! A mismerge, when I rebased over your ipfw improvements.

glebius marked an inline comment as done.
  • Address ae@ review, fix multiple mismerges and bugs in ip_fw_pfil.
  • Obsolete MLINKS.
  • My copyrights.
sbin/pfilctl/pfilctl.c
5 ↗(On Diff #53487)

Small nit, if you can either drop the All rights reserved (no longer needed) or if you want it could you fit it on the line with the copyright so that someone does not insert a copyright between yours and it. Ditto in other places, I just picked one.

sbin/pfilctl/pfilctl.c
5 ↗(On Diff #53487)

Thanks. Should I keep the line where it existed before?

  • Remove "all rights reserved" from new files.
share/man/man9/pfil.9
5 ↗(On Diff #53490)

I think this is the place your talking about keeping it. Yes, you need to keep this as it looks like Matthew R Green put it there, and it would have to be him to remove it. You could fold it up onto the same line as his copyright, making it clear from here forward that it was him that asserted this, but there is not a need to.

By the way, your adding your copyright a good way, above the other so it doesnt seperate his copyright from his all rights reserved, thank you for this detail!

This revision was not accepted when it landed; it landed in state Needs Review.Mar 17 2019, 12:13 PM
This revision was automatically updated to reflect the committed changes.