Changeset View
Changeset View
Standalone View
Standalone View
share/man/man4/netlink.4
- This file was added.
.\" | |||||||||||
.\" Copyright (C) 2022 Alexander Chernikov <melifaro@FreeBSD.org>. | |||||||||||
.\" | |||||||||||
.\" Redistribution and use in source and binary forms, with or without | |||||||||||
.\" modification, are permitted provided that the following conditions | |||||||||||
.\" are met: | |||||||||||
.\" 1. Redistributions of source code must retain the above copyright | |||||||||||
.\" notice, this list of conditions and the following disclaimer. | |||||||||||
.\" 2. Redistributions in binary form must reproduce the above copyright | |||||||||||
.\" notice, this list of conditions and the following disclaimer in the | |||||||||||
.\" documentation and/or other materials provided with the distribution. | |||||||||||
.\" | |||||||||||
.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND | |||||||||||
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |||||||||||
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |||||||||||
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE | |||||||||||
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |||||||||||
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS | |||||||||||
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) | |||||||||||
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT | |||||||||||
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY | |||||||||||
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF | |||||||||||
.\" SUCH DAMAGE. | |||||||||||
.\" | |||||||||||
.\" $FreeBSD$ | |||||||||||
.\" | |||||||||||
.Dd September 30, 2022 | |||||||||||
.Dt NETLINK 4 | |||||||||||
.Os | |||||||||||
.Sh NAME | |||||||||||
.Nm Netlink | |||||||||||
.Nd Kernel network configuration protocol | |||||||||||
.Sh SYNOPSIS | |||||||||||
.In netlink/netlink.h | |||||||||||
.In netlink/netlink_route.h | |||||||||||
.Ft int | |||||||||||
.Fn socket AF_NETLINK SOCK_DGRAM int family | |||||||||||
.Sh DESCRIPTION | |||||||||||
Netlink is a user-kernel message-based communication protocol primarily used | |||||||||||
for network stack configuration. | |||||||||||
Netlink is easily extendable and supports large dumps and event | |||||||||||
notifications, all via a single socket. | |||||||||||
The protocol is fully asynchronous, allowing one to issue and track multiple | |||||||||||
requests at once. | |||||||||||
Netlink consists of multiple families, which commonly group the commands | |||||||||||
belonging to the particular kernel subsystem. | |||||||||||
Currently, the supported families are: | |||||||||||
.Pp | |||||||||||
.Bd -literal -offset indent -compact | |||||||||||
NETLINK_ROUTE network configuration, | |||||||||||
NETLINK_GENERIC "container" family | |||||||||||
.Ed | |||||||||||
.Pp | |||||||||||
The | |||||||||||
.Dv NETLINK_ROUTE | |||||||||||
family handles all interfaces, addresses, neighbors, routes, and VNETs | |||||||||||
configuration. | |||||||||||
pauamma_gundo.com: VNETs? Not other kinds of network? | |||||||||||
Done Inline ActionsVnets. Or namespaces in Linux land melifaro: Vnets. Or namespaces in Linux land | |||||||||||
More details can be found in | |||||||||||
.Xr rtnetlink 4 . | |||||||||||
The | |||||||||||
.Dv NETLINK_GENERIC | |||||||||||
family serves as a | |||||||||||
.Do container Dc , | |||||||||||
allowing registering other families under the | |||||||||||
.Dv NETLINK_GENERIC | |||||||||||
umbrella. | |||||||||||
This approach allows using a single netlink socket to interact with | |||||||||||
multiple netlink families at once. | |||||||||||
Done Inline Actions
pauamma_gundo.com: | |||||||||||
More details can be found in | |||||||||||
.Xr genetlink 4 . | |||||||||||
.Pp | |||||||||||
Netlink has its own sockaddr structure: | |||||||||||
.Bd -literal | |||||||||||
struct sockaddr_nl { | |||||||||||
uint8_t nl_len; /* sizeof(sockaddr_nl) */ | |||||||||||
sa_family_t nl_family; /* netlink family */ | |||||||||||
uint16_t nl_pad; /* reserved, set to 0 */ | |||||||||||
uint32_t nl_pid; /* automatically selected, set to 0 */ | |||||||||||
uint32_t nl_groups; /* multicast groups mask to bind to */ | |||||||||||
}; | |||||||||||
.Ed | |||||||||||
.Pp | |||||||||||
Typically, filling this structure is not required for socket operations. | |||||||||||
It is presented here for completeness. | |||||||||||
Done Inline Actions
pauamma_gundo.com: | |||||||||||
.Sh PROTOCOL DESCRIPTION | |||||||||||
The protocol is message-based. | |||||||||||
Each message starts with the mandatory | |||||||||||
.Va nlmsghdr | |||||||||||
header, followed by the family-specific header and the list of | |||||||||||
type-length-value pairs (TLVs). | |||||||||||
TLVs can be nested. | |||||||||||
All headers and TLVS are padded to 32-bit boundaries. | |||||||||||
Each | |||||||||||
.Xr send 2 or | |||||||||||
.Xr recv 2 | |||||||||||
system call may contain multiple messages. | |||||||||||
.Ss BASE HEADER | |||||||||||
.Bd -literal | |||||||||||
struct nlmsghdr { | |||||||||||
uint32_t nlmsg_len; /* Length of message including header */ | |||||||||||
uint16_t nlmsg_type; /* Message type identifier */ | |||||||||||
uint16_t nlmsg_flags; /* Flags (NLM_F_) */ | |||||||||||
uint32_t nlmsg_seq; /* Sequence number */ | |||||||||||
uint32_t nlmsg_pid; /* Sending process port ID */ | |||||||||||
}; | |||||||||||
.Ed | |||||||||||
.Pp | |||||||||||
The | |||||||||||
.Va nlmsg_len | |||||||||||
field stores the whole message length, in bytes, including the header. | |||||||||||
This length has to be rounded up to the nearest 32-bit boundary when | |||||||||||
pauamma_gundo.comUnsubmitted Done Inline Actions
"32-bit" when the length is in bytes doesn't look quite right. For consistency, I'd change the other occurrences as well. pauamma_gundo.com: "32-bit" when the length is in bytes doesn't look quite right. For consistency, I'd change the… | |||||||||||
iterating over messages. | |||||||||||
Done Inline Actions
Spurious space. pauamma_gundo.com: Spurious space. | |||||||||||
The | |||||||||||
.Va nlmsg_type | |||||||||||
field represents the command/request type. | |||||||||||
This value is family-specific. | |||||||||||
The list of supported commands can be found in the relevant family | |||||||||||
header file. | |||||||||||
.Va nlmsg_seq | |||||||||||
is a user-provided request identifier. | |||||||||||
An application can track the operation result using the | |||||||||||
.Dv NLMSG_ERROR | |||||||||||
messages and matching the | |||||||||||
.Va nlmsg_seq | |||||||||||
. | |||||||||||
The | |||||||||||
.Va nlmsg_pid | |||||||||||
field is the message sender id. | |||||||||||
This field is optional for userland. | |||||||||||
The kernel sender id is zero. | |||||||||||
The | |||||||||||
.Va nlmsg_flags | |||||||||||
field contains the message-specific flags. | |||||||||||
The following generic flags are defined: | |||||||||||
.Pp | |||||||||||
.Bd -literal -offset indent -compact | |||||||||||
NLM_F_REQUEST Indicates that the message is an actual request to the kernel | |||||||||||
NLM_F_ACK Request an explicit ACK message with an operation result | |||||||||||
.Ed | |||||||||||
.Pp | |||||||||||
The following generic flags are defined for the "GET" request types: | |||||||||||
.Pp | |||||||||||
.Bd -literal -offset indent -compact | |||||||||||
NLM_F_ROOT Return the whole dataset | |||||||||||
NLM_F_MATCH Return all entries matching the criteria | |||||||||||
.Ed | |||||||||||
These two flags are typically used together, aliased to | |||||||||||
.Dv NLM_F_DUMP | |||||||||||
.Pp | |||||||||||
The following generic flags are defined for the "NEW" request types: | |||||||||||
.Pp | |||||||||||
.Bd -literal -offset indent -compact | |||||||||||
NLM_F_CREATE Create an object if none exists | |||||||||||
NLM_F_EXCL Don't replace an object if it exists | |||||||||||
NLM_F_REPLACE Replace an existing matching object | |||||||||||
NLM_F_APPEND Append to an existing object | |||||||||||
.Ed | |||||||||||
.Pp | |||||||||||
The following generic flags are defined for the replies: | |||||||||||
.Pp | |||||||||||
.Bd -literal -offset indent -compact | |||||||||||
NLM_F_MULTI Indicates that the message is part of the message group | |||||||||||
NLM_F_DUMP_INTR Indicates that the state dump was not completed | |||||||||||
NLM_F_DUMP_FILTERED Indicates that the dump was filtered per request | |||||||||||
NLM_F_CAPPED Indicates the original message was capped to its header | |||||||||||
NLM_F_ACK_TLVS Indicates that extended ACK TLVs were included | |||||||||||
.Ed | |||||||||||
.Ss TLVs | |||||||||||
Most messages encode their attributes as type-length-value pairs (TLVs). | |||||||||||
The base TLV header: | |||||||||||
.Bd -literal | |||||||||||
struct nlattr { | |||||||||||
uint16_t nla_len; /* Total attribute length */ | |||||||||||
uint16_t nla_type; /* Attribute type */ | |||||||||||
}; | |||||||||||
.Ed | |||||||||||
The TLV type | |||||||||||
.Pq Va nla_type | |||||||||||
scope is typically the message type or group within a family. | |||||||||||
For example, the | |||||||||||
.Dv RTN_MULTICAST | |||||||||||
type value is only valid for | |||||||||||
.Dv RTM_NEWROUTE | |||||||||||
, | |||||||||||
.Dv RTM_DELROUTE | |||||||||||
and | |||||||||||
.Dv RTM_GETROUTE | |||||||||||
messages. | |||||||||||
TLVs can be nested; in that case internal TLVs may have their own sub-types. | |||||||||||
All TLVs are packed with 32-bit padding. | |||||||||||
.Ss CONTROL MESSAGES | |||||||||||
A number of generic control messages are reserved in each family. | |||||||||||
.Pp | |||||||||||
.Dv NLMSG_ERROR | |||||||||||
reports the operation result if requested, optionally followed by | |||||||||||
the metadata TLVs. | |||||||||||
The value of | |||||||||||
.Va nlmsg_seq | |||||||||||
is set to its value in the original messages, while | |||||||||||
pauamma_gundo.comUnsubmitted Not Done Inline Actions"message", maybe? pauamma_gundo.com: "message", maybe? | |||||||||||
.Va nlmsg_pid | |||||||||||
is set to the socket pid of the original socket. | |||||||||||
The operation result is reported via | |||||||||||
.Vt "struct nlmsgerr": | |||||||||||
.Bd -literal | |||||||||||
struct nlmsgerr { | |||||||||||
int error; /* Standard errno */ | |||||||||||
struct nlmsghdr msg; /* Original message header */ | |||||||||||
}; | |||||||||||
.Ed | |||||||||||
If the | |||||||||||
.Dv NETLINK_CAP_ACK | |||||||||||
socket option is not set, the remainder of the original message will follow. | |||||||||||
If the | |||||||||||
.Dv NETLINK_EXT_ACK | |||||||||||
socket option is set, kernel may add a | |||||||||||
.Dv NLMSGERR_ATTR_MSG | |||||||||||
string TLV with the textual error description, optionally followed by the | |||||||||||
.Dv NLMSGERR_ATTR_OFFS | |||||||||||
TLV, indicating the offset from the message start that triggered an error. | |||||||||||
.Pp | |||||||||||
.Dv NLMSG_DONE | |||||||||||
indicates the end of the message group: typically, the end of the dump. | |||||||||||
It contains a single | |||||||||||
.Vt int | |||||||||||
field, describing the dump result as a standard errno value. | |||||||||||
.Sh SOCKET OPTIONS | |||||||||||
Netlink supports a number of custom socket options, which can be set with | |||||||||||
.Xr setsockopt 2 | |||||||||||
with the | |||||||||||
.Dv SOL_NETLINK | |||||||||||
.Fa level : | |||||||||||
.Bl -tag -width indent | |||||||||||
.It Dv NETLINK_ADD_MEMBERSHIP | |||||||||||
Subscribes to the notifications for the specific group (int). | |||||||||||
.It Dv NETLINK_DROP_MEMBERSHIP | |||||||||||
Unsubscribes from the notifications for the specific group (int). | |||||||||||
Done Inline ActionsNETLINK_DROP_MEMBERSHIP bapt: NETLINK_DROP_MEMBERSHIP | |||||||||||
.It Dv NETLINK_LIST_MEMBERSHIPS | |||||||||||
Lists the memberships as a bitmask. | |||||||||||
.It Dv NETLINK_CAP_ACK | |||||||||||
Instructs the kernel to send the original message header in the reply | |||||||||||
without the message body. | |||||||||||
.It Dv NETLINK_EXT_ACK | |||||||||||
Acknowledges ability to receive additional TLVs in the ACK message. | |||||||||||
.El | |||||||||||
.Pp | |||||||||||
Additionally, netlink overrides the following socket options from the | |||||||||||
.Dv SOL_SOCKET | |||||||||||
.Fa level : | |||||||||||
.Bl -tag -width indent | |||||||||||
.It Dv SO_RCVBUF | |||||||||||
Sets the maximum size of the socket receive buffer. | |||||||||||
If the caller has | |||||||||||
.Dv PRIV_NET_ROUTE | |||||||||||
permission, the value can exceed the currently-set | |||||||||||
.Va kern.ipc.maxsockbuf | |||||||||||
value. | |||||||||||
.El | |||||||||||
.Sh SYSCTL VARIABLES | |||||||||||
A set of | |||||||||||
.Xr sysctl 8 | |||||||||||
variables is available to tweak run-time parameters: | |||||||||||
.Bl -tag -width indent | |||||||||||
.It Va net.netlink.sendspace | |||||||||||
Default send buffer for the netlink socket. | |||||||||||
Note that the socket sendspace has to be at least as long as the longest | |||||||||||
message that can be transmitted via this socket. | |||||||||||
.El | |||||||||||
.Bl -tag -width indent | |||||||||||
.It Va net.netlink.recvspace | |||||||||||
Default receive buffer for the netlink socket. | |||||||||||
Note that the socket recvspace has to be least as long as the longest | |||||||||||
message that can be received from this socket. | |||||||||||
.El | |||||||||||
.Sh DEBUGGING | |||||||||||
Netlink implements per-functional-unit debugging, with different severities | |||||||||||
controllable via | |||||||||||
Done Inline Actions
pauamma_gundo.com: | |||||||||||
pauamma_gundo.comUnsubmitted Done Inline Actions
pauamma_gundo.com: | |||||||||||
.Va net.netlink.debug | |||||||||||
branch. | |||||||||||
These messages are logged in the kernel message buffer and can be seen in | |||||||||||
.Xr dmesg 8 | |||||||||||
. | |||||||||||
The following severity levels are defined: | |||||||||||
.Bl -tag -width indent | |||||||||||
.It Dv LOG_DEBUG(7) | |||||||||||
Rare events or per-socket errors are reported here. | |||||||||||
This is the default level, not impacting production performance. | |||||||||||
.It Dv LOG_DEBUG2(8) | |||||||||||
Socket events such as groups memberships, privilege checks, commands and dumps | |||||||||||
are logged. | |||||||||||
This level does not incur significant performance overhead. | |||||||||||
.It Dv LOG_DEBUG9(9) | |||||||||||
All socket events, each dumped or modified entities are logged. | |||||||||||
Turning it on may result in significant performance overhead. | |||||||||||
.El | |||||||||||
.Sh ERRORS | |||||||||||
Netlink reports operation results, including errors and error metadata, by | |||||||||||
sending a | |||||||||||
.Dv NLMSG_ERROR | |||||||||||
message for each request message. | |||||||||||
The following errors can be returned: | |||||||||||
.Bl -tag -width Er | |||||||||||
.It Bq Er EPERM | |||||||||||
when the current privileges are insufficient to perform the required operation; | |||||||||||
.It Bo Er ENOBUFS Bc or Bo Er ENOMEM Bc | |||||||||||
when the system runs out of memory for | |||||||||||
an internal data structure; | |||||||||||
.It Bq Er ENOTSUP | |||||||||||
when the requested command is not supported by the family or | |||||||||||
the family is not supported; | |||||||||||
.It Bq Er EINVAL | |||||||||||
when some necessary TLVs are missing or invalid, detailed info | |||||||||||
may be provided in NLMSGERR_ATTR_MSG and NLMSGERR_ATTR_OFFS TLVs; | |||||||||||
.It Bq Er ENOENT | |||||||||||
when trying to delete a non-existent object. | |||||||||||
.Pp | |||||||||||
Additionally, a socket operation itself may fail with one of the errors | |||||||||||
specified in | |||||||||||
.Xr socket 2 | |||||||||||
, | |||||||||||
.Xr recv 2 | |||||||||||
or | |||||||||||
.Xr send 2 | |||||||||||
. | |||||||||||
.El | |||||||||||
.Sh SEE ALSO | |||||||||||
.Xr genetrlink 4 , | |||||||||||
.Xr rtnetlink 4 | |||||||||||
.Rs | |||||||||||
.%A "J. Salim" | |||||||||||
.%A "H. Khosravi" | |||||||||||
.%A "A. Kleen" | |||||||||||
.%A "A. Kuznetsov" | |||||||||||
.%T "Linux Netlink as an IP Services Protocol" | |||||||||||
.%O "RFC 3549" | |||||||||||
.Re | |||||||||||
.Sh HISTORY | |||||||||||
The netlink protocol appeared in | |||||||||||
.Fx 14.0 . | |||||||||||
.Sh AUTHORS | |||||||||||
The netlink was implemented by | |||||||||||
pauamma_gundo.comUnsubmitted Not Done Inline Actions
pauamma_gundo.com: | |||||||||||
melifaroAuthorUnsubmitted Done Inline ActionsMm, I'm a bit unsure about this one. Technically it's indeed an OS feature, but it sounds a bit abstract to me. We typically say, "X has IPv6 [protocol] support" or "X has TCP [protocol]" and refer to the word "feature" when there is no better word to describe it. Thoughts? melifaro: Mm, I'm a bit unsure about this one. Technically it's indeed an OS feature, but it sounds a bit… | |||||||||||
pauamma_gundo.comUnsubmitted Not Done Inline ActionsWell, there's definitely a word missing, so we need to choose one to add. I was trying to avoid repeating "protocol", but that would work. pauamma_gundo.com: Well, there's definitely a word missing, so we need to choose one to add. I was trying to avoid… | |||||||||||
.An -nosplit | |||||||||||
.An Alexander Chernikov Aq Mt melifaro@FreeBSD.org . | |||||||||||
It was derived from the Google Summer of Code 2021 project by | |||||||||||
.An Ng Peng Nam Sean . | |||||||||||
Lint: Possible Spelling Mistake Possible spelling error. You wrote 'nam', but did you mean 'name'? Lint: Possible Spelling Mistake: Possible spelling error. You wrote 'nam', but did you mean 'name'? |
VNETs? Not other kinds of network?