Index: stable/12/share/man/man4/netmap.4 =================================================================== --- stable/12/share/man/man4/netmap.4 (revision 339909) +++ stable/12/share/man/man4/netmap.4 (revision 339910) @@ -1,1145 +1,1148 @@ .\" Copyright (c) 2011-2014 Matteo Landi, Luigi Rizzo, Universita` di Pisa .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" This document is derived in part from the enet man page (enet.4) .\" distributed with 4.3BSD Unix. .\" .\" $FreeBSD$ .\" -.Dd March 2, 2017 +.Dd October 23, 2018 .Dt NETMAP 4 .Os .Sh NAME .Nm netmap .Nd a framework for fast packet I/O .Nm VALE .Nd a fast VirtuAl Local Ethernet using the netmap API .Pp .Nm netmap pipes .Nd a shared memory packet transport channel .Sh SYNOPSIS .Cd device netmap .Sh DESCRIPTION .Nm is a framework for extremely fast and efficient packet I/O for userspace and kernel clients, and for Virtual Machines. It runs on .Fx Linux and some versions of Windows, and supports a variety of .Nm netmap ports , including .Bl -tag -width XXXX .It Nm physical NIC ports to access individual queues of network interfaces; .It Nm host ports to inject packets into the host stack; .It Nm VALE ports implementing a very fast and modular in-kernel software switch/dataplane; .It Nm netmap pipes a shared memory packet transport channel; .It Nm netmap monitors a mechanism similar to .Xr bpf 4 to capture traffic .El .Pp All these .Nm netmap ports are accessed interchangeably with the same API, and are at least one order of magnitude faster than standard OS mechanisms (sockets, bpf, tun/tap interfaces, native switches, pipes). With suitably fast hardware (NICs, PCIe buses, CPUs), packet I/O using .Nm on supported NICs reaches 14.88 million packets per second (Mpps) with much less than one core on 10 Gbit/s NICs; 35-40 Mpps on 40 Gbit/s NICs (limited by the hardware); about 20 Mpps per core for VALE ports; and over 100 Mpps for .Nm netmap pipes. NICs without native .Nm support can still use the API in emulated mode, which uses unmodified device drivers and is 3-5 times faster than .Xr bpf 4 or raw sockets. .Pp Userspace clients can dynamically switch NICs into .Nm mode and send and receive raw packets through memory mapped buffers. Similarly, .Nm VALE switch instances and ports, .Nm netmap pipes and .Nm netmap monitors can be created dynamically, providing high speed packet I/O between processes, virtual machines, NICs and the host stack. .Pp .Nm supports both non-blocking I/O through .Xr ioctl 2 , synchronization and blocking I/O through a file descriptor and standard OS mechanisms such as .Xr select 2 , .Xr poll 2 , .Xr epoll 2 , and .Xr kqueue 2 . All types of .Nm netmap ports and the .Nm VALE switch are implemented by a single kernel module, which also emulates the .Nm API over standard drivers. For best performance, .Nm requires native support in device drivers. A list of such devices is at the end of this document. .Pp In the rest of this (long) manual page we document various aspects of the .Nm and .Nm VALE architecture, features and usage. .Sh ARCHITECTURE .Nm supports raw packet I/O through a .Em port , which can be connected to a physical interface .Em ( NIC ) , to the host stack, or to a .Nm VALE switch. Ports use preallocated circular queues of buffers .Em ( rings ) residing in an mmapped region. There is one ring for each transmit/receive queue of a NIC or virtual port. An additional ring pair connects to the host stack. .Pp After binding a file descriptor to a port, a .Nm client can send or receive packets in batches through the rings, and possibly implement zero-copy forwarding between ports. .Pp All NICs operating in .Nm mode use the same memory region, accessible to all processes who own .Pa /dev/netmap file descriptors bound to NICs. Independent .Nm VALE and .Nm netmap pipe ports by default use separate memory regions, but can be independently configured to share memory. .Sh ENTERING AND EXITING NETMAP MODE The following section describes the system calls to create and control .Nm netmap ports (including .Nm VALE and .Nm netmap pipe ports). Simpler, higher level functions are described in the .Sx LIBRARIES section. .Pp Ports and rings are created and controlled through a file descriptor, created by opening a special device .Dl fd = open("/dev/netmap"); and then bound to a specific port with an .Dl ioctl(fd, NIOCREGIF, (struct nmreq *)arg); .Pp .Nm has multiple modes of operation controlled by the .Vt struct nmreq argument. .Va arg.nr_name specifies the netmap port name, as follows: .Bl -tag -width XXXX .It Dv OS network interface name (e.g., 'em0', 'eth1', ... ) the data path of the NIC is disconnected from the host stack, and the file descriptor is bound to the NIC (one or all queues), or to the host stack; .It Dv valeSSS:PPP the file descriptor is bound to port PPP of VALE switch SSS. Switch instances and ports are dynamically created if necessary. .Pp Both SSS and PPP have the form [0-9a-zA-Z_]+ , the string cannot exceed IFNAMSIZ characters, and PPP cannot be the name of any existing OS network interface. .El .Pp On return, .Va arg indicates the size of the shared memory region, and the number, size and location of all the .Nm data structures, which can be accessed by mmapping the memory .Dl char *mem = mmap(0, arg.nr_memsize, fd); .Pp Non-blocking I/O is done with special .Xr ioctl 2 .Xr select 2 and .Xr poll 2 on the file descriptor permit blocking I/O. .Xr epoll 2 and .Xr kqueue 2 are not supported on .Nm file descriptors. .Pp While a NIC is in .Nm mode, the OS will still believe the interface is up and running. OS-generated packets for that NIC end up into a .Nm ring, and another ring is used to send packets into the OS network stack. A .Xr close 2 on the file descriptor removes the binding, and returns the NIC to normal mode (reconnecting the data path to the host stack), or destroys the virtual port. .Sh DATA STRUCTURES The data structures in the mmapped memory region are detailed in .In sys/net/netmap.h , which is the ultimate reference for the .Nm API. The main structures and fields are indicated below: .Bl -tag -width XXX .It Dv struct netmap_if (one per interface) .Bd -literal struct netmap_if { ... const uint32_t ni_flags; /* properties */ ... const uint32_t ni_tx_rings; /* NIC tx rings */ const uint32_t ni_rx_rings; /* NIC rx rings */ uint32_t ni_bufs_head; /* head of extra bufs list */ ... }; .Ed .Pp Indicates the number of available rings .Pa ( struct netmap_rings ) and their position in the mmapped region. The number of tx and rx rings .Pa ( ni_tx_rings , ni_rx_rings ) normally depends on the hardware. NICs also have an extra tx/rx ring pair connected to the host stack. .Em NIOCREGIF can also request additional unbound buffers in the same memory space, to be used as temporary storage for packets. .Pa ni_bufs_head contains the index of the first of these free rings, which are connected in a list (the first uint32_t of each buffer being the index of the next buffer in the list). A .Dv 0 indicates the end of the list. .It Dv struct netmap_ring (one per ring) .Bd -literal struct netmap_ring { ... const uint32_t num_slots; /* slots in each ring */ const uint32_t nr_buf_size; /* size of each buffer */ ... uint32_t head; /* (u) first buf owned by user */ uint32_t cur; /* (u) wakeup position */ const uint32_t tail; /* (k) first buf owned by kernel */ ... uint32_t flags; struct timeval ts; /* (k) time of last rxsync() */ ... struct netmap_slot slot[0]; /* array of slots */ } .Ed .Pp Implements transmit and receive rings, with read/write pointers, metadata and an array of .Em slots describing the buffers. .It Dv struct netmap_slot (one per buffer) .Bd -literal struct netmap_slot { uint32_t buf_idx; /* buffer index */ uint16_t len; /* packet length */ uint16_t flags; /* buf changed, etc. */ uint64_t ptr; /* address for indirect buffers */ }; .Ed .Pp Describes a packet buffer, which normally is identified by an index and resides in the mmapped region. .It Dv packet buffers Fixed size (normally 2 KB) packet buffers allocated by the kernel. .El .Pp The offset of the .Pa struct netmap_if in the mmapped region is indicated by the .Pa nr_offset field in the structure returned by .Dv NIOCREGIF . From there, all other objects are reachable through relative references (offsets or indexes). Macros and functions in .In net/netmap_user.h help converting them into actual pointers: .Pp .Dl struct netmap_if *nifp = NETMAP_IF(mem, arg.nr_offset); .Dl struct netmap_ring *txr = NETMAP_TXRING(nifp, ring_index); .Dl struct netmap_ring *rxr = NETMAP_RXRING(nifp, ring_index); .Pp .Dl char *buf = NETMAP_BUF(ring, buffer_index); .Sh RINGS, BUFFERS AND DATA I/O .Va Rings are circular queues of packets with three indexes/pointers .Va ( head , cur , tail ) ; one slot is always kept empty. The ring size .Va ( num_slots ) should not be assumed to be a power of two. .Pp .Va head is the first slot available to userspace; .Pp .Va cur is the wakeup point: select/poll will unblock when .Va tail passes .Va cur ; .Pp .Va tail is the first slot reserved to the kernel. .Pp Slot indexes .Em must only move forward; for convenience, the function .Dl nm_ring_next(ring, index) returns the next index modulo the ring size. .Pp .Va head and .Va cur are only modified by the user program; .Va tail is only modified by the kernel. The kernel only reads/writes the .Vt struct netmap_ring slots and buffers during the execution of a netmap-related system call. The only exception are slots (and buffers) in the range .Va tail\ . . . head-1 , that are explicitly assigned to the kernel. .Pp .Ss TRANSMIT RINGS On transmit rings, after a .Nm system call, slots in the range .Va head\ . . . tail-1 are available for transmission. User code should fill the slots sequentially and advance .Va head and .Va cur past slots ready to transmit. .Va cur may be moved further ahead if the user code needs more slots before further transmissions (see .Sx SCATTER GATHER I/O ) . .Pp At the next NIOCTXSYNC/select()/poll(), slots up to .Va head-1 are pushed to the port, and .Va tail may advance if further slots have become available. Below is an example of the evolution of a TX ring: .Bd -literal after the syscall, slots between cur and tail are (a)vailable head=cur tail | | v v TX [.....aaaaaaaaaaa.............] user creates new packets to (T)ransmit head=cur tail | | v v TX [.....TTTTTaaaaaa.............] NIOCTXSYNC/poll()/select() sends packets and reports new slots head=cur tail | | v v TX [..........aaaaaaaaaaa........] .Ed .Pp .Fn select and .Fn poll will block if there is no space in the ring, i.e., .Dl ring->cur == ring->tail and return when new slots have become available. .Pp High speed applications may want to amortize the cost of system calls by preparing as many packets as possible before issuing them. .Pp A transmit ring with pending transmissions has .Dl ring->head != ring->tail + 1 (modulo the ring size). The function .Va int nm_tx_pending(ring) implements this test. .Ss RECEIVE RINGS On receive rings, after a .Nm system call, the slots in the range .Va head\& . . . tail-1 contain received packets. User code should process them and advance .Va head and .Va cur past slots it wants to return to the kernel. .Va cur may be moved further ahead if the user code wants to wait for more packets without returning all the previous slots to the kernel. .Pp At the next NIOCRXSYNC/select()/poll(), slots up to .Va head-1 are returned to the kernel for further receives, and .Va tail may advance to report new incoming packets. .Pp Below is an example of the evolution of an RX ring: .Bd -literal after the syscall, there are some (h)eld and some (R)eceived slots head cur tail | | | v v v RX [..hhhhhhRRRRRRRR..........] user advances head and cur, releasing some slots and holding others head cur tail | | | v v v RX [..*****hhhRRRRRR...........] NICRXSYNC/poll()/select() recovers slots and reports new packets head cur tail | | | v v v RX [.......hhhRRRRRRRRRRRR....] .Ed .Sh SLOTS AND PACKET BUFFERS Normally, packets should be stored in the netmap-allocated buffers assigned to slots when ports are bound to a file descriptor. One packet is fully contained in a single buffer. .Pp The following flags affect slot and buffer processing: .Bl -tag -width XXX .It NS_BUF_CHANGED .Em must be used when the .Va buf_idx in the slot is changed. This can be used to implement zero-copy forwarding, see .Sx ZERO-COPY FORWARDING . .It NS_REPORT reports when this buffer has been transmitted. Normally, .Nm notifies transmit completions in batches, hence signals can be delayed indefinitely. This flag helps detect when packets have been sent and a file descriptor can be closed. .It NS_FORWARD When a ring is in 'transparent' mode (see .Sx TRANSPARENT MODE ) , packets marked with this flag are forwarded to the other endpoint at the next system call, thus restoring (in a selective way) the connection between a NIC and the host stack. .It NS_NO_LEARN tells the forwarding code that the source MAC address for this packet must not be used in the learning bridge code. .It NS_INDIRECT indicates that the packet's payload is in a user-supplied buffer whose user virtual address is in the 'ptr' field of the slot. The size can reach 65535 bytes. .Pp This is only supported on the transmit ring of .Nm VALE ports, and it helps reducing data copies in the interconnection of virtual machines. .It NS_MOREFRAG indicates that the packet continues with subsequent buffers; the last buffer in a packet must have the flag clear. .El .Sh SCATTER GATHER I/O Packets can span multiple slots if the .Va NS_MOREFRAG flag is set in all but the last slot. The maximum length of a chain is 64 buffers. This is normally used with .Nm VALE ports when connecting virtual machines, as they generate large TSO segments that are not split unless they reach a physical device. .Pp NOTE: The length field always refers to the individual fragment; there is no place with the total length of a packet. .Pp On receive rings the macro .Va NS_RFRAGS(slot) indicates the remaining number of slots for this packet, including the current one. Slots with a value greater than 1 also have NS_MOREFRAG set. .Sh IOCTLS .Nm uses two ioctls (NIOCTXSYNC, NIOCRXSYNC) for non-blocking I/O. They take no argument. Two more ioctls (NIOCGINFO, NIOCREGIF) are used to query and configure ports, with the following argument: .Bd -literal struct nmreq { char nr_name[IFNAMSIZ]; /* (i) port name */ uint32_t nr_version; /* (i) API version */ uint32_t nr_offset; /* (o) nifp offset in mmap region */ uint32_t nr_memsize; /* (o) size of the mmap region */ uint32_t nr_tx_slots; /* (i/o) slots in tx rings */ uint32_t nr_rx_slots; /* (i/o) slots in rx rings */ uint16_t nr_tx_rings; /* (i/o) number of tx rings */ uint16_t nr_rx_rings; /* (i/o) number of rx rings */ uint16_t nr_ringid; /* (i/o) ring(s) we care about */ uint16_t nr_cmd; /* (i) special command */ uint16_t nr_arg1; /* (i/o) extra arguments */ uint16_t nr_arg2; /* (i/o) extra arguments */ uint32_t nr_arg3; /* (i/o) extra arguments */ uint32_t nr_flags /* (i/o) open mode */ ... }; .Ed .Pp A file descriptor obtained through .Pa /dev/netmap also supports the ioctl supported by network devices, see .Xr netintro 4 . .Bl -tag -width XXXX .It Dv NIOCGINFO returns EINVAL if the named port does not support netmap. Otherwise, it returns 0 and (advisory) information about the port. Note that all the information below can change before the interface is actually put in netmap mode. .Bl -tag -width XX .It Pa nr_memsize indicates the size of the .Nm memory region. NICs in .Nm mode all share the same memory region, whereas .Nm VALE ports have independent regions for each port. .It Pa nr_tx_slots , nr_rx_slots indicate the size of transmit and receive rings. .It Pa nr_tx_rings , nr_rx_rings indicate the number of transmit and receive rings. Both ring number and sizes may be configured at runtime using interface-specific functions (e.g., .Xr ethtool 8 ). .El .It Dv NIOCREGIF binds the port named in .Va nr_name to the file descriptor. For a physical device this also switches it into .Nm mode, disconnecting it from the host stack. Multiple file descriptors can be bound to the same port, with proper synchronization left to the user. .Pp The recommended way to bind a file descriptor to a port is to use function .Va nm_open(..) (see .Sx LIBRARIES ) which parses names to access specific port types and enable features. In the following we document the main features. .Pp .Dv NIOCREGIF can also bind a file descriptor to one endpoint of a .Em netmap pipe , consisting of two netmap ports with a crossover connection. A netmap pipe share the same memory space of the parent port, and is meant to enable configuration where a master process acts as a dispatcher towards slave processes. .Pp To enable this function, the .Pa nr_arg1 field of the structure can be used as a hint to the kernel to indicate how many pipes we expect to use, and reserve extra space in the memory region. .Pp On return, it gives the same info as NIOCGINFO, with .Pa nr_ringid and .Pa nr_flags indicating the identity of the rings controlled through the file descriptor. .Pp .Va nr_flags .Va nr_ringid selects which rings are controlled through this file descriptor. Possible values of .Pa nr_flags are indicated below, together with the naming schemes that application libraries (such as the .Nm nm_open indicated below) can use to indicate the specific set of rings. In the example below, "netmap:foo" is any valid netmap port name. .Bl -tag -width XXXXX .It NR_REG_ALL_NIC "netmap:foo" (default) all hardware ring pairs .It NR_REG_SW "netmap:foo^" the ``host rings'', connecting to the host stack. .It NR_REG_NIC_SW "netmap:foo+" all hardware rings and the host rings .It NR_REG_ONE_NIC "netmap:foo-i" only the i-th hardware ring pair, where the number is in .Pa nr_ringid ; .It NR_REG_PIPE_MASTER "netmap:foo{i" the master side of the netmap pipe whose identifier (i) is in .Pa nr_ringid ; .It NR_REG_PIPE_SLAVE "netmap:foo}i" the slave side of the netmap pipe whose identifier (i) is in .Pa nr_ringid . .Pp The identifier of a pipe must be thought as part of the pipe name, and does not need to be sequential. On return the pipe will only have a single ring pair with index 0, irrespective of the value of .Va i. .El .Pp By default, a .Xr poll 2 or .Xr select 2 call pushes out any pending packets on the transmit ring, even if no write events are specified. The feature can be disabled by or-ing .Va NETMAP_NO_TX_POLL to the value written to .Va nr_ringid. When this feature is used, packets are transmitted only on .Va ioctl(NIOCTXSYNC) or select()/poll() are called with a write event (POLLOUT/wfdset) or a full ring. .Pp When registering a virtual interface that is dynamically created to a .Xr vale 4 switch, we can specify the desired number of rings (1 by default, and currently up to 16) on it using nr_tx_rings and nr_rx_rings fields. .It Dv NIOCTXSYNC tells the hardware of new packets to transmit, and updates the number of slots available for transmission. .It Dv NIOCRXSYNC tells the hardware of consumed packets, and asks for newly available packets. .El .Sh SELECT, POLL, EPOLL, KQUEUE. .Xr select 2 and .Xr poll 2 on a .Nm file descriptor process rings as indicated in .Sx TRANSMIT RINGS and .Sx RECEIVE RINGS , respectively when write (POLLOUT) and read (POLLIN) events are requested. Both block if no slots are available in the ring .Va ( ring->cur == ring->tail ) . Depending on the platform, .Xr epoll 2 and .Xr kqueue 2 are supported too. .Pp Packets in transmit rings are normally pushed out (and buffers reclaimed) even without requesting write events. Passing the .Dv NETMAP_NO_TX_POLL flag to .Em NIOCREGIF disables this feature. By default, receive rings are processed only if read events are requested. Passing the .Dv NETMAP_DO_RX_POLL flag to .Em NIOCREGIF updates receive rings even without read events. Note that on epoll and kqueue, .Dv NETMAP_NO_TX_POLL and .Dv NETMAP_DO_RX_POLL only have an effect when some event is posted for the file descriptor. .Sh LIBRARIES The .Nm API is supposed to be used directly, both because of its simplicity and for efficient integration with applications. .Pp For convenience, the .In net/netmap_user.h header provides a few macros and functions to ease creating a file descriptor and doing I/O with a .Nm port. These are loosely modeled after the .Xr pcap 3 API, to ease porting of libpcap-based applications to .Nm . To use these extra functions, programs should .Dl #define NETMAP_WITH_LIBS before .Dl #include .Pp The following functions are available: .Bl -tag -width XXXXX .It Va struct nm_desc * nm_open(const char *ifname, const struct nmreq *req, uint64_t flags, const struct nm_desc *arg) similar to .Xr pcap_open 3pcap , binds a file descriptor to a port. .Bl -tag -width XX .It Va ifname is a port name, in the form "netmap:PPP" for a NIC and "valeSSS:PPP" for a .Nm VALE port. .It Va req provides the initial values for the argument to the NIOCREGIF ioctl. The nm_flags and nm_ringid values are overwritten by parsing ifname and flags, and other fields can be overridden through the other two arguments. .It Va arg points to a struct nm_desc containing arguments (e.g., from a previously open file descriptor) that should override the defaults. The fields are used as described below .It Va flags can be set to a combination of the following flags: .Va NETMAP_NO_TX_POLL , .Va NETMAP_DO_RX_POLL (copied into nr_ringid); .Va NM_OPEN_NO_MMAP (if arg points to the same memory region, avoids the mmap and uses the values from it); .Va NM_OPEN_IFNAME (ignores ifname and uses the values in arg); .Va NM_OPEN_ARG1 , .Va NM_OPEN_ARG2 , .Va NM_OPEN_ARG3 (uses the fields from arg); .Va NM_OPEN_RING_CFG (uses the ring number and sizes from arg). .El .It Va int nm_close(struct nm_desc *d) closes the file descriptor, unmaps memory, frees resources. .It Va int nm_inject(struct nm_desc *d, const void *buf, size_t size) similar to pcap_inject(), pushes a packet to a ring, returns the size of the packet is successful, or 0 on error; .It Va int nm_dispatch(struct nm_desc *d, int cnt, nm_cb_t cb, u_char *arg) similar to pcap_dispatch(), applies a callback to incoming packets .It Va u_char * nm_nextpkt(struct nm_desc *d, struct nm_pkthdr *hdr) similar to pcap_next(), fetches the next packet .El .Sh SUPPORTED DEVICES .Nm natively supports the following devices: .Pp On FreeBSD: .Xr cxgbe 4 , .Xr em 4 , .Xr igb 4 , .Xr ixgbe 4 , .Xr ixl 4 , .Xr lem 4 , .Xr re 4 . .Pp On Linux .Xr e1000 4 , .Xr e1000e 4 , .Xr i40e 4 , .Xr igb 4 , .Xr ixgbe 4 , .Xr r8169 4 . .Pp NICs without native support can still be used in .Nm mode through emulation. Performance is inferior to native netmap mode but still significantly higher than various raw socket types (bpf, PF_PACKET, etc.). Note that for slow devices (such as 1 Gbit/s and slower NICs, or several 10 Gbit/s NICs whose hardware is unable to sustain line rate), emulated and native mode will likely have similar or same throughput. .Pp When emulation is in use, packet sniffer programs such as tcpdump could see received packets before they are diverted by netmap. This behaviour is not intentional, being just an artifact of the implementation of emulation. Note that in case the netmap application subsequently moves packets received from the emulated adapter onto the host RX ring, the sniffer will intercept those packets again, since the packets are injected to the host stack as they were received by the network interface. .Pp Emulation is also available for devices with native netmap support, which can be used for testing or performance comparison. The sysctl variable .Va dev.netmap.admode globally controls how netmap mode is implemented. .Sh SYSCTL VARIABLES AND MODULE PARAMETERS Some aspect of the operation of .Nm are controlled through sysctl variables on FreeBSD .Em ( dev.netmap.* ) and module parameters on Linux .Em ( /sys/module/netmap_lin/parameters/* ) : .Bl -tag -width indent .It Va dev.netmap.admode: 0 Controls the use of native or emulated adapter mode. .Pp 0 uses the best available option; .Pp 1 forces native mode and fails if not available; .Pp 2 forces emulated hence never fails. .It Va dev.netmap.generic_ringsize: 1024 Ring size used for emulated netmap mode .It Va dev.netmap.generic_mit: 100000 Controls interrupt moderation for emulated mode .It Va dev.netmap.mmap_unreg: 0 .It Va dev.netmap.fwd: 0 Forces NS_FORWARD mode .It Va dev.netmap.flags: 0 .It Va dev.netmap.txsync_retry: 2 .It Va dev.netmap.no_pendintr: 1 Forces recovery of transmit buffers on system calls .It Va dev.netmap.mitigate: 1 Propagates interrupt mitigation to user processes .It Va dev.netmap.no_timestamp: 0 Disables the update of the timestamp in the netmap ring .It Va dev.netmap.verbose: 0 Verbose kernel messages .It Va dev.netmap.buf_num: 163840 .It Va dev.netmap.buf_size: 2048 .It Va dev.netmap.ring_num: 200 .It Va dev.netmap.ring_size: 36864 .It Va dev.netmap.if_num: 100 .It Va dev.netmap.if_size: 1024 Sizes and number of objects (netmap_if, netmap_ring, buffers) for the global memory region. The only parameter worth modifying is .Va dev.netmap.buf_num as it impacts the total amount of memory used by netmap. .It Va dev.netmap.buf_curr_num: 0 .It Va dev.netmap.buf_curr_size: 0 .It Va dev.netmap.ring_curr_num: 0 .It Va dev.netmap.ring_curr_size: 0 .It Va dev.netmap.if_curr_num: 0 .It Va dev.netmap.if_curr_size: 0 Actual values in use. .It Va dev.netmap.bridge_batch: 1024 Batch size used when moving packets across a .Nm VALE switch. Values above 64 generally guarantee good performance. .El .Sh SYSTEM CALLS .Nm uses .Xr select 2 , .Xr poll 2 , .Xr epoll 2 and .Xr kqueue 2 to wake up processes when significant events occur, and .Xr mmap 2 to map memory. .Xr ioctl 2 is used to configure ports and .Nm VALE switches . .Pp Applications may need to create threads and bind them to specific cores to improve performance, using standard OS primitives, see .Xr pthread 3 . In particular, .Xr pthread_setaffinity_np 3 may be of use. .Sh EXAMPLES .Ss TEST PROGRAMS .Nm comes with a few programs that can be used for testing or simple applications. See the .Pa examples/ directory in .Nm distributions, or .Pa tools/tools/netmap/ directory in .Fx distributions. .Pp .Xr pkt-gen 8 is a general purpose traffic source/sink. .Pp As an example .Dl pkt-gen -i ix0 -f tx -l 60 can generate an infinite stream of minimum size packets, and .Dl pkt-gen -i ix0 -f rx is a traffic sink. Both print traffic statistics, to help monitor how the system performs. .Pp .Xr pkt-gen 8 has many options can be uses to set packet sizes, addresses, rates, and use multiple send/receive threads and cores. .Pp .Xr bridge 4 is another test program which interconnects two .Nm ports. It can be used for transparent forwarding between interfaces, as in .Dl bridge -i ix0 -i ix1 or even connect the NIC to the host stack using netmap .Dl bridge -i ix0 -i ix0 .Ss USING THE NATIVE API The following code implements a traffic generator .Pp .Bd -literal -compact #include \&... void sender(void) { struct netmap_if *nifp; struct netmap_ring *ring; struct nmreq nmr; struct pollfd fds; fd = open("/dev/netmap", O_RDWR); bzero(&nmr, sizeof(nmr)); strcpy(nmr.nr_name, "ix0"); nmr.nm_version = NETMAP_API; ioctl(fd, NIOCREGIF, &nmr); p = mmap(0, nmr.nr_memsize, fd); nifp = NETMAP_IF(p, nmr.nr_offset); ring = NETMAP_TXRING(nifp, 0); fds.fd = fd; fds.events = POLLOUT; for (;;) { poll(&fds, 1, -1); while (!nm_ring_empty(ring)) { i = ring->cur; buf = NETMAP_BUF(ring, ring->slot[i].buf_index); ... prepare packet in buf ... ring->slot[i].len = ... packet length ... ring->head = ring->cur = nm_ring_next(ring, i); } } } .Ed .Ss HELPER FUNCTIONS A simple receiver can be implemented using the helper functions .Bd -literal -compact #define NETMAP_WITH_LIBS #include \&... void receiver(void) { struct nm_desc *d; struct pollfd fds; u_char *buf; struct nm_pkthdr h; ... d = nm_open("netmap:ix0", NULL, 0, 0); fds.fd = NETMAP_FD(d); fds.events = POLLIN; for (;;) { poll(&fds, 1, -1); while ( (buf = nm_nextpkt(d, &h)) ) consume_pkt(buf, h->len); } nm_close(d); } .Ed .Ss ZERO-COPY FORWARDING Since physical interfaces share the same memory region, it is possible to do packet forwarding between ports swapping buffers. The buffer from the transmit ring is used to replenish the receive ring: .Bd -literal -compact uint32_t tmp; struct netmap_slot *src, *dst; ... src = &src_ring->slot[rxr->cur]; dst = &dst_ring->slot[txr->cur]; tmp = dst->buf_idx; dst->buf_idx = src->buf_idx; dst->len = src->len; dst->flags = NS_BUF_CHANGED; src->buf_idx = tmp; src->flags = NS_BUF_CHANGED; rxr->head = rxr->cur = nm_ring_next(rxr, rxr->cur); txr->head = txr->cur = nm_ring_next(txr, txr->cur); ... .Ed .Ss ACCESSING THE HOST STACK The host stack is for all practical purposes just a regular ring pair, which you can access with the netmap API (e.g., with .Dl nm_open("netmap:eth0^", ... ) ; All packets that the host would send to an interface in .Nm mode end up into the RX ring, whereas all packets queued to the TX ring are send up to the host stack. .Ss VALE SWITCH A simple way to test the performance of a .Nm VALE switch is to attach a sender and a receiver to it, e.g., running the following in two different terminals: .Dl pkt-gen -i vale1:a -f rx # receiver .Dl pkt-gen -i vale1:b -f tx # sender The same example can be used to test netmap pipes, by simply changing port names, e.g., .Dl pkt-gen -i vale2:x{3 -f rx # receiver on the master side .Dl pkt-gen -i vale2:x}3 -f tx # sender on the slave side .Pp The following command attaches an interface and the host stack to a switch: .Dl vale-ctl -h vale2:em0 Other .Nm clients attached to the same switch can now communicate with the network card or the host. .Sh SEE ALSO +.Xr pkt-gen 8 , +.Xr bridge 8 +.Pp .Pa http://info.iet.unipi.it/~luigi/netmap/ .Pp Luigi Rizzo, Revisiting network I/O APIs: the netmap framework, Communications of the ACM, 55 (3), pp.45-51, March 2012 .Pp Luigi Rizzo, netmap: a novel framework for fast packet I/O, Usenix ATC'12, June 2012, Boston .Pp Luigi Rizzo, Giuseppe Lettieri, VALE, a switched ethernet for virtual machines, ACM CoNEXT'12, December 2012, Nice .Pp Luigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione, Speeding up packet I/O in virtual machines, ACM/IEEE ANCS'13, October 2013, San Jose .Sh AUTHORS .An -nosplit The .Nm framework has been originally designed and implemented at the Universita` di Pisa in 2011 by .An Luigi Rizzo , and further extended with help from .An Matteo Landi , .An Gaetano Catalli , .An Giuseppe Lettieri , and .An Vincenzo Maffione . .Pp .Nm and .Nm VALE have been funded by the European Commission within FP7 Projects CHANGE (257422) and OPENLAB (287581). .Sh CAVEATS No matter how fast the CPU and OS are, achieving line rate on 10G and faster interfaces requires hardware with sufficient performance. Several NICs are unable to sustain line rate with small packet sizes. Insufficient PCIe or memory bandwidth can also cause reduced performance. .Pp Another frequent reason for low performance is the use of flow control on the link: a slow receiver can limit the transmit speed. Be sure to disable flow control when running high speed experiments. .Ss SPECIAL NIC FEATURES .Nm is orthogonal to some NIC features such as multiqueue, schedulers, packet filters. .Pp Multiple transmit and receive rings are supported natively and can be configured with ordinary OS tools, such as .Xr ethtool 8 or device-specific sysctl variables. The same goes for Receive Packet Steering (RPS) and filtering of incoming traffic. .Pp .Nm .Em does not use features such as .Em checksum offloading , TCP segmentation offloading , .Em encryption , VLAN encapsulation/decapsulation , etc. When using netmap to exchange packets with the host stack, make sure to disable these features. Index: stable/12/tools/tools/netmap/bridge.8 =================================================================== --- stable/12/tools/tools/netmap/bridge.8 (nonexistent) +++ stable/12/tools/tools/netmap/bridge.8 (revision 339910) @@ -0,0 +1,82 @@ +.\" Copyright (c) 2016 Luigi Rizzo, Universita` di Pisa +.\" +.\" Redistribution and use in source and binary forms, with or without +.\" modification, are permitted provided that the following conditions +.\" are met: +.\" 1. Redistributions of source code must retain the above copyright +.\" notice, this list of conditions and the following disclaimer. +.\" 2. Redistributions in binary form must reproduce the above copyright +.\" notice, this list of conditions and the following disclaimer in the +.\" documentation and/or other materials provided with the distribution. +.\" +.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND +.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE +.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL +.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS +.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) +.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT +.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY +.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF +.\" SUCH DAMAGE. +.\" +.\" $FreeBSD$ +.\" +.Dd October 23, 2018 +.Dt BRIDGE 8 +.Os +.Sh NAME +.Nm bridge +.Nd netmap client to bridge two netmap ports +.Sh SYNOPSIS +.Bk -words +.Bl -tag -width "bridge" +.It Nm +.Op Fl i Ar port +.Op Fl b Ar batch size +.Op Fl w Ar wait-link +.Op Fl v +.Op Fl c +.El +.Ek +.Sh DESCRIPTION +.Nm +is a simple netmap application that bridges packets between two netmap ports. +If the two netmap ports use the same netmap memory region +.Nm +forwards packets without copying the packets payload (zero-copy mode), unless +explicitly prevented by the +.Fl c +flag. +.Bl -tag -width Ds +.It Fl i Ar port +Name of the netmap port. +It can be supplied up to two times to identify the ports that must be bridged. +Any netmap port type (physical interface, VALE switch, pipe, monitor port...) +can be used. +If the option is supplied only once, then it must be for a physical interface and, in that case, +.Nm +will bridge the port and the host stack. +.It Fl b Ar batch-size +Maximum number of packets to send in one operation. +.It Fl w Ar wait-link +indicates the number of seconds to wait before transmitting. +It defaults to 2, and may be useful when talking to physical +ports to let link negotiation complete before starting transmission. +.It Fl v +Enable verbose mode +.It Fl c +Disable zero-copy mode. +.El +.Sh SEE ALSO +.Xr netmap 4 , +.Xr pkt-gen 8 +.Sh AUTHORS +.An -nosplit +.Nm +has been written by +.An Luigi Rizzo +and +.An Matteo Landi +at the Universita` di Pisa, Italy. Property changes on: stable/12/tools/tools/netmap/bridge.8 ___________________________________________________________________ Added: svn:eol-style ## -0,0 +1 ## +native \ No newline at end of property Added: svn:keywords ## -0,0 +1 ## +FreeBSD=%H \ No newline at end of property Added: svn:mime-type ## -0,0 +1 ## +text/plain \ No newline at end of property Index: stable/12/tools/tools/netmap/bridge.c =================================================================== --- stable/12/tools/tools/netmap/bridge.c (revision 339909) +++ stable/12/tools/tools/netmap/bridge.c (revision 339910) @@ -1,335 +1,356 @@ /* * (C) 2011-2014 Luigi Rizzo, Matteo Landi * * BSD license * * A netmap client to bridge two network interfaces * (or one interface and the host stack). * * $FreeBSD$ */ #include #define NETMAP_WITH_LIBS #include #include int verbose = 0; static int do_abort = 0; static int zerocopy = 1; /* enable zerocopy if possible */ static void sigint_h(int sig) { (void)sig; /* UNUSED */ do_abort = 1; signal(SIGINT, SIG_DFL); } /* * how many packets on this set of queues ? */ int pkt_queued(struct nm_desc *d, int tx) { - u_int i, tot = 0; + u_int i, tot = 0; - if (tx) { - for (i = d->first_tx_ring; i <= d->last_tx_ring; i++) { - tot += nm_ring_space(NETMAP_TXRING(d->nifp, i)); - } - } else { - for (i = d->first_rx_ring; i <= d->last_rx_ring; i++) { - tot += nm_ring_space(NETMAP_RXRING(d->nifp, i)); - } - } - return tot; + if (tx) { + for (i = d->first_tx_ring; i <= d->last_tx_ring; i++) { + tot += nm_ring_space(NETMAP_TXRING(d->nifp, i)); + } + } else { + for (i = d->first_rx_ring; i <= d->last_rx_ring; i++) { + tot += nm_ring_space(NETMAP_RXRING(d->nifp, i)); + } + } + return tot; } /* * move up to 'limit' pkts from rxring to txring swapping buffers. */ static int process_rings(struct netmap_ring *rxring, struct netmap_ring *txring, u_int limit, const char *msg) { u_int j, k, m = 0; /* print a warning if any of the ring flags is set (e.g. NM_REINIT) */ if (rxring->flags || txring->flags) D("%s rxflags %x txflags %x", msg, rxring->flags, txring->flags); j = rxring->cur; /* RX */ k = txring->cur; /* TX */ m = nm_ring_space(rxring); if (m < limit) limit = m; m = nm_ring_space(txring); if (m < limit) limit = m; m = limit; while (limit-- > 0) { struct netmap_slot *rs = &rxring->slot[j]; struct netmap_slot *ts = &txring->slot[k]; /* swap packets */ if (ts->buf_idx < 2 || rs->buf_idx < 2) { - D("wrong index rx[%d] = %d -> tx[%d] = %d", + RD(5, "wrong index rx[%d] = %d -> tx[%d] = %d", j, rs->buf_idx, k, ts->buf_idx); sleep(2); } /* copy the packet length. */ - if (rs->len > 2048) { - D("wrong len %d rx[%d] -> tx[%d]", rs->len, j, k); + if (rs->len > rxring->nr_buf_size) { + RD(5, "wrong len %d rx[%d] -> tx[%d]", rs->len, j, k); rs->len = 0; } else if (verbose > 1) { D("%s send len %d rx[%d] -> tx[%d]", msg, rs->len, j, k); } ts->len = rs->len; if (zerocopy) { uint32_t pkt = ts->buf_idx; ts->buf_idx = rs->buf_idx; rs->buf_idx = pkt; /* report the buffer change. */ ts->flags |= NS_BUF_CHANGED; rs->flags |= NS_BUF_CHANGED; + /* copy the NS_MOREFRAG */ + rs->flags = (rs->flags & ~NS_MOREFRAG) | (ts->flags & NS_MOREFRAG); } else { char *rxbuf = NETMAP_BUF(rxring, rs->buf_idx); char *txbuf = NETMAP_BUF(txring, ts->buf_idx); nm_pkt_copy(rxbuf, txbuf, ts->len); } j = nm_ring_next(rxring, j); k = nm_ring_next(txring, k); } rxring->head = rxring->cur = j; txring->head = txring->cur = k; if (verbose && m > 0) D("%s sent %d packets to %p", msg, m, txring); return (m); } /* move packts from src to destination */ static int move(struct nm_desc *src, struct nm_desc *dst, u_int limit) { struct netmap_ring *txring, *rxring; u_int m = 0, si = src->first_rx_ring, di = dst->first_tx_ring; - const char *msg = (src->req.nr_ringid & NETMAP_SW_RING) ? + const char *msg = (src->req.nr_flags == NR_REG_SW) ? "host->net" : "net->host"; while (si <= src->last_rx_ring && di <= dst->last_tx_ring) { rxring = NETMAP_RXRING(src->nifp, si); txring = NETMAP_TXRING(dst->nifp, di); ND("txring %p rxring %p", txring, rxring); if (nm_ring_empty(rxring)) { si++; continue; } if (nm_ring_empty(txring)) { di++; continue; } m += process_rings(rxring, txring, limit, msg); } return (m); } static void usage(void) { fprintf(stderr, - "usage: bridge [-v] [-i ifa] [-i ifb] [-b burst] [-w wait_time] [ifa [ifb [burst]]]\n"); + "netmap bridge program: forward packets between two " + "network interfaces\n" + " usage(1): bridge [-v] [-i ifa] [-i ifb] [-b burst] " + "[-w wait_time] [-L]\n" + " usage(2): bridge [-v] [-w wait_time] [-L] " + "[ifa [ifb [burst]]]\n" + "\n" + " ifa and ifb are specified using the nm_open() syntax.\n" + " When ifb is missing (or is equal to ifa), bridge will\n" + " forward between between ifa and the host stack if -L\n" + " is not specified, otherwise loopback traffic on ifa.\n" + "\n" + " example: bridge -w 10 -i netmap:eth3 -i netmap:eth1\n" + ); exit(1); } /* * bridge [-v] if1 [if2] * * If only one name, or the two interfaces are the same, * bridges userland and the adapter. Otherwise bridge * two intefaces. */ int main(int argc, char **argv) { struct pollfd pollfd[2]; int ch; u_int burst = 1024, wait_link = 4; struct nm_desc *pa = NULL, *pb = NULL; char *ifa = NULL, *ifb = NULL; char ifabuf[64] = { 0 }; + int loopback = 0; - fprintf(stderr, "%s built %s %s\n", - argv[0], __DATE__, __TIME__); + fprintf(stderr, "%s built %s %s\n\n", argv[0], __DATE__, __TIME__); - while ( (ch = getopt(argc, argv, "b:ci:vw:")) != -1) { + while ((ch = getopt(argc, argv, "hb:ci:vw:L")) != -1) { switch (ch) { default: D("bad option %c %s", ch, optarg); + /* fallthrough */ + case 'h': usage(); break; case 'b': /* burst */ burst = atoi(optarg); break; case 'i': /* interface */ if (ifa == NULL) ifa = optarg; else if (ifb == NULL) ifb = optarg; else D("%s ignored, already have 2 interfaces", optarg); break; case 'c': zerocopy = 0; /* do not zerocopy */ break; case 'v': verbose++; break; case 'w': wait_link = atoi(optarg); break; + case 'L': + loopback = 1; + break; } } argc -= optind; argv += optind; if (argc > 0) ifa = argv[0]; if (argc > 1) ifb = argv[1]; if (argc > 2) burst = atoi(argv[2]); if (!ifb) ifb = ifa; if (!ifa) { D("missing interface"); usage(); } if (burst < 1 || burst > 8192) { D("invalid burst %d, set to 1024", burst); burst = 1024; } if (wait_link > 100) { D("invalid wait_link %d, set to 4", wait_link); wait_link = 4; } if (!strcmp(ifa, ifb)) { - D("same interface, endpoint 0 goes to host"); - snprintf(ifabuf, sizeof(ifabuf) - 1, "%s^", ifa); - ifa = ifabuf; + if (!loopback) { + D("same interface, endpoint 0 goes to host"); + snprintf(ifabuf, sizeof(ifabuf) - 1, "%s^", ifa); + ifa = ifabuf; + } else { + D("same interface, loopbacking traffic"); + } } else { /* two different interfaces. Take all rings on if1 */ } pa = nm_open(ifa, NULL, 0, NULL); if (pa == NULL) { D("cannot open %s", ifa); return (1); } /* try to reuse the mmap() of the first interface, if possible */ pb = nm_open(ifb, NULL, NM_OPEN_NO_MMAP, pa); if (pb == NULL) { D("cannot open %s", ifb); nm_close(pa); return (1); } zerocopy = zerocopy && (pa->mem == pb->mem); D("------- zerocopy %ssupported", zerocopy ? "" : "NOT "); - /* setup poll(2) variables. */ + /* setup poll(2) array */ memset(pollfd, 0, sizeof(pollfd)); pollfd[0].fd = pa->fd; pollfd[1].fd = pb->fd; D("Wait %d secs for link to come up...", wait_link); sleep(wait_link); D("Ready to go, %s 0x%x/%d <-> %s 0x%x/%d.", pa->req.nr_name, pa->first_rx_ring, pa->req.nr_rx_rings, pb->req.nr_name, pb->first_rx_ring, pb->req.nr_rx_rings); /* main loop */ signal(SIGINT, sigint_h); while (!do_abort) { int n0, n1, ret; pollfd[0].events = pollfd[1].events = 0; pollfd[0].revents = pollfd[1].revents = 0; n0 = pkt_queued(pa, 0); n1 = pkt_queued(pb, 0); #if defined(_WIN32) || defined(BUSYWAIT) - if (n0){ + if (n0) { ioctl(pollfd[1].fd, NIOCTXSYNC, NULL); pollfd[1].revents = POLLOUT; - } - else { + } else { ioctl(pollfd[0].fd, NIOCRXSYNC, NULL); } - if (n1){ + if (n1) { ioctl(pollfd[0].fd, NIOCTXSYNC, NULL); pollfd[0].revents = POLLOUT; - } - else { + } else { ioctl(pollfd[1].fd, NIOCRXSYNC, NULL); } ret = 1; #else if (n0) pollfd[1].events |= POLLOUT; else pollfd[0].events |= POLLIN; if (n1) pollfd[0].events |= POLLOUT; else pollfd[1].events |= POLLIN; + + /* poll() also cause kernel to txsync/rxsync the NICs */ ret = poll(pollfd, 2, 2500); -#endif //defined(_WIN32) || defined(BUSYWAIT) +#endif /* defined(_WIN32) || defined(BUSYWAIT) */ if (ret <= 0 || verbose) D("poll %s [0] ev %x %x rx %d@%d tx %d," " [1] ev %x %x rx %d@%d tx %d", ret <= 0 ? "timeout" : "ok", pollfd[0].events, pollfd[0].revents, pkt_queued(pa, 0), NETMAP_RXRING(pa->nifp, pa->cur_rx_ring)->cur, pkt_queued(pa, 1), pollfd[1].events, pollfd[1].revents, pkt_queued(pb, 0), NETMAP_RXRING(pb->nifp, pb->cur_rx_ring)->cur, pkt_queued(pb, 1) ); if (ret < 0) continue; if (pollfd[0].revents & POLLERR) { struct netmap_ring *rx = NETMAP_RXRING(pa->nifp, pa->cur_rx_ring); D("error on fd0, rx [%d,%d,%d)", rx->head, rx->cur, rx->tail); } if (pollfd[1].revents & POLLERR) { struct netmap_ring *rx = NETMAP_RXRING(pb->nifp, pb->cur_rx_ring); D("error on fd1, rx [%d,%d,%d)", rx->head, rx->cur, rx->tail); } - if (pollfd[0].revents & POLLOUT) { + if (pollfd[0].revents & POLLOUT) move(pb, pa, burst); - // XXX we don't need the ioctl */ - // ioctl(me[0].fd, NIOCTXSYNC, NULL); - } - if (pollfd[1].revents & POLLOUT) { + + if (pollfd[1].revents & POLLOUT) move(pa, pb, burst); - // XXX we don't need the ioctl */ - // ioctl(me[1].fd, NIOCTXSYNC, NULL); - } + + /* We don't need ioctl(NIOCTXSYNC) on the two file descriptors here, + * kernel will txsync on next poll(). */ } - D("exiting"); nm_close(pb); nm_close(pa); return (0); } Index: stable/12/tools/tools/netmap/pkt-gen.8 =================================================================== --- stable/12/tools/tools/netmap/pkt-gen.8 (revision 339909) +++ stable/12/tools/tools/netmap/pkt-gen.8 (revision 339910) @@ -1,179 +1,178 @@ .\" Copyright (c) 2016, George V. Neville-Neil .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions are met: .\" .\" 1. Redistributions of source code must retain the above copyright notice, .\" this list of conditions and the following disclaimer. .\" .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" .\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE .\" LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE .\" POSSIBILITY OF SUCH DAMAGE. .\" .\" $FreeBSD$ .\" -.Dd May 1, 2016 +.Dd October 23, 2018 .Dt PKT-GEN 8 .Os .Sh NAME .Nm pkt-gen .Nd Packet generator for use with .Xr netmap 4 .Sh SYNOPSIS .Bl -item -compact .It .Nm .Op Fl i Ar interface .Op Fl f Ar function .Op Fl n Ar count .Op Fl t Ar pkts_to_send .Op Fl r Ar pkts_to_receive .Op Fl l Ar pkt_size .Op Fl d Ar dst_ip[:port[-dst_ip:port]] .Op Fl s Ar src_ip[:port[-src_ip:port]] .Op Fl D Ar dst-mac .Op Fl S Ar src-mac .Op Fl a Ar cpu_id .Op Fl b Ar burst size .Op Fl c Ar cores .Op Fl p Ar threads .Op Fl T Ar report_ms .Op Fl P .Op Fl w Ar wait_for_link_time .Op Fl R Ar rate .Op Fl X .Op Fl H Ar len .Op Fl P Ar xfile .Op Fl z .Op Fl Z .Sh DESCRIPTION .Nm generates and receives raw network packets using .Xr netmap 4 . The arguments are as follows: .Pp .Bl -tag -width Ds .It Fl i Ar interface Network interface name. .It Fl f Ar function tx rx ping pong Set the function to transmit, receive of ping/pong. .It Fl n count Number of iterations (can be 0). .It Fl t pkts_to_send Number of packets to send. Also forces transmit mode. .It Fl r Ar pkts_to_receive Number of packets to receive. Also forces rx mode. .It Fl l Ar pkt_size Packet size in bytes excluding CRC. .It Fl d Ar dst_ip[:port[-dst_ip:port]] Destination IPv4 address and port, single or range. .It Fl s Ar src_ip[:port[-src_ip:port]] Source IPv4 address and port, single or range. .It Fl D Ar dst-mac Destination MAC address in colon notation. .It Fl S Ar src-mac Source MAC address in colon notation. .It Fl a Ar cpu_id Tie .Nm to a particular CPU core using .Xr setaffinity 2. .It Fl b Ar burst size Set the size of a burst of packets. .It Fl c Ar cores Number of cores to use. .It Fl p Ar threads Number of threads to use. .It Fl T Ar report_ms Number of milliseconds between reports. .It Fl P Use libpcap instead of netmap for reading or writing. .It Fl w Ar wait_for_link_time Number of seconds to wait to make sure that the network link is up. A network device driver may take some time to create a new transmit/receive ring pair when .Xr netmap 4 requests one. .It Fl R Ar rate Packet transmission rate. Not setting the packet transmission rate tells .Nm to transmit packets as quickly as possible. On servers from 2010 on-wards .Xr netmap 4 is able to completely use all of the bandwidth of a 10 or 40Gbps link, so this option should be used unless your intention is to saturate the link. .It Fl X Dump payload transmitted or received. .It Fl H Ar len Add empty virtio-net-header with size 'len'. This option is only use with Virtual Machine technologies that use virtio as a network interface. .It Fl P Ar file Load the packet from a pcap file rather than constructing it inside of .Nm .It Fl z Use random IPv4 src address/port .It Fl Z Use random IPv4 dst address/port .El .Pp .Nm is a raw packet generator that can utilize either .Xr netmap 4 or .Xr bpf 4 but which is most often uses with .Xr netmap 4 . The .Ar interface name used depends upon how the underlying Ethernet driver exposes its transmit and receive rings to .Xr netmap 4 . Most modern network interfaces that support 10Gbps and higher speeds have several transmit and receive rings that are used by the operating system to balance traffic across the interface. .Nm can peel off one or more of the transmit or receive rings for its own use without interfering with packets that might otherwise be destined for the host. For example on a system with a Chelsio Network Interface Card (NIC) the interface specification of .Ar -i netmap:ncxl0 gives .Nm access to a pair of transmit and receive rings that are separate from the more commonly known cxl0 interface, which is used by the operating system's TCP/IP stack. .Sh EXAMPLES Capture and count all packets arriving on the operating system's cxl0 interface. Using this will block packets from reaching the operating system's network stack. .Dl .Pp .Nm -i cxl0 -f rx .Pp Send a stream of fake DNS packets between two hosts with a packet length of 128 bytes. You must set the destination MAC address for packets to be received by the target host. .Pp .Dl .Nm -i netmap:ncxl0 -f tx -s 172.16.0.1:53 -d 172.16.1.3:53 -D 00:07:43:29:2a:e0 -.Sh FILES -.Xr netmap 4 .Sh SEE ALSO -.Xr netmap 4 +.Xr netmap 4 , +.Xr bridge 8 .Sh AUTHORS This manual page was written by .An George V. Neville-Neil Aq gnn@FreeBSD.org . Index: stable/12 =================================================================== --- stable/12 (revision 339909) +++ stable/12 (revision 339910) Property changes on: stable/12 ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head:r339659