Page MenuHomeFreeBSD

netgraph: Allow larger messages in communication between kernel and user-space
Needs ReviewPublic

Authored by donner on Feb 27 2020, 12:35 PM.
Tags
None
Referenced Files
F103586533: D23850.diff
Tue, Nov 26, 7:31 PM
Unknown Object (File)
Sat, Nov 23, 4:02 PM
Unknown Object (File)
Fri, Nov 22, 6:24 PM
Unknown Object (File)
Fri, Nov 22, 3:35 AM
Unknown Object (File)
Wed, Nov 20, 9:28 PM
Unknown Object (File)
Thu, Nov 7, 11:47 PM
Unknown Object (File)
Thu, Nov 7, 11:40 PM
Unknown Object (File)
Thu, Nov 7, 10:50 PM

Details

Reviewers
None
Group Reviewers
manpages
network
Summary

In D23840 handling of large message data was modified. Despite the solution internal to the kernel, such amounts of data need a large enough communication channel from kernel to userspace. This is currently limited by ng_socket(3) tuned by sysctls for net.graph.maxdgram and net.graph.recvspace.

Despite the fact, that even this tuning has limits which are easily reached by real world applications (i.e. msg bridge gettable), the whole tuning process is broken by design.

This patch is a slowly growing work in progress. It's split into smaller, single purpose diffs, which are easier to review.

First step contains:

  • Rename NG_VERSION to NGM_VERSION in order to be consistent in naming and alert external implementors about the upcoming change.
  • Remove a misleading constant NGF_ORIG, which is abused to reset all flags instead of clearing NGF_RESP.
  • Introduce a new flag NGF_FRAG as well as a new message format ng_mesg2/ng_msghdr2 containing a fragmentation offset field.
Test Plan

Netgraph should work as before after applying the each stage of the patch.
Large messages generated by i.e. a crowded ng_bridge(3) node should be visible from user space.

Diff Detail

Repository
rS FreeBSD src repository - subversion
Lint
No Lint Coverage
Unit
No Test Coverage
Build Status
Buildable 29708
Build 27557: arc lint + arc unit

Event Timeline

lib/libnetgraph/msg.c
77

Do not set any flags. It's already zeroed space.

169

This overwrites other flags.

sys/netgraph/ng_base.c
3252–3256

I wonder if those sysctls are of any use. Better nuke them?

donner marked an inline comment as not done.

The idea is to allow a split of large messages into smaller ones over size limited links. In kernel this is never necessary.

So the next step is to implement the fragmentation in ng_socket and libnetgraph, which both should be able to fragment into version9 and reassemble back to version8. So all kernel modules as well as all user space programs don't need to be modified.

Fragmentation utilizes a flag indicating that the message is not complete yet, and an offset field where to attach the next fragment to the already received data. It's assumed, that the packets stay in order, so a fragmented message may looks like:

version=9 offset=0    flags=NGF_FRAG arglen=1000 data=[1000 bytes]
version=9 offset=1000 flags=NGF_FRAG arglen=1000 data=[1000 bytes]
version=9 offset=2000 flags=0        arglen=1000 data=[ 100 bytes]

which fragmented from and reassembled to

version=8 arglen=2100 data=[2100 bytes]

Such a ng_socket node will accept both versions, but fragment outgoing messages only if the special flag NGS_FLAG_FRAGMENT was set from user space.

Later steps will move the reassembly to the netgraph core unless the netgaph module set a "i'll do fragmentation myself" in the kernel ABI. Such an approach allows to remove the old message format completely afterwards.

Added two more occurences of direct use of NG_VERSION: libexec/pppoed/pppoed.c usr.sbin/ppp/ether.c

sys/dev/ce/if_ce.c is not to change, because the constant is only used for old FreeBSD versions, currenct active code use libnetgraph only.

donner retitled this revision from netgraph: Allow larger messages in communitcation outside of the kernel to netgraph: Allow larger messages in communication between kernel and user-space.Feb 27 2020, 2:40 PM

Using libnetgraph is sufficient, no explicit test for version numbers necessary.

Allow the documented version to differ from the real structure. Document only the guaranteed elements.

Store the current parameters of socket initialization in per socket data structures.
This way control and data sockets are allowed to have different buffer sizes (in theory).
And the buffer size of the socket in question is available for fragmentation handling, even if the sysctl values are changed.

bcr added a subscriber: bcr.

OK from manpages.

Updated to revision 368820.