D17735.id49720.diff
No OneTemporary
Actions

Size

48 KB

Referenced Files

None

Subscribers

None

D17735.id49720.diff
View Options

	Index: share/man/man4/netmap.4
	===================================================================
	--- share/man/man4/netmap.4
	+++ share/man/man4/netmap.4
	@@ -1073,8 +1073,11 @@
	clients attached to the same switch can now communicate
	with the network card or the host.
	.Sh SEE ALSO
	-.Xr pkt-gen 8 ,
	-.Xr bridge 8
	+.Xr vale 4 ,
	+.Xr vale-ctl 4 ,
	+.Xr bridge 8 ,
	+.Xr lb 8 ,
	+.Xr pkt-gen 8
	.Pp
	.Pa http://info.iet.unipi.it/~luigi/netmap/
	.Pp
	Index: tools/tools/netmap/Makefile
	===================================================================
	--- tools/tools/netmap/Makefile
	+++ tools/tools/netmap/Makefile
	@@ -3,7 +3,7 @@
	#
	# For multiple programs using a single source file each,
	# we can just define 'progs' and create custom targets.
	-PROGS = pkt-gen nmreplay bridge vale-ctl
	+PROGS = pkt-gen nmreplay bridge vale-ctl lb

	CLEANFILES = $(PROGS) *.o
	MAN=
	@@ -34,3 +34,6 @@

	vale-ctl: vale-ctl.o
	$(CC) $(CFLAGS) -o vale-ctl vale-ctl.o
	+
	+lb: lb.o pkt_hash.o
	+ $(CC) $(CFLAGS) -o lb lb.o pkt_hash.o $(LDFLAGS)
	Index: tools/tools/netmap/README
	===================================================================
	--- tools/tools/netmap/README
	+++ tools/tools/netmap/README
	@@ -1,9 +1,13 @@
	$FreeBSD$

	-This directory contains examples that use netmap
	+This directory contains applications that use the netmap API

	- pkt-gen a packet sink/source using the netmap API
	+ pkt-gen a multi-function packet generator and traffic sink

	- bridge a two-port jumper wire, also using the native API
	+ bridge a two-port jumper wire, also using the netmap API

	- vale-ctl the program to control VALE bridges
	+ vale-ctl the program to control and inspect VALE switches
	+
	+ lb an L3/L4 load balancer
	+
	+ nmreplay a tool to playback a pcap file to a netmap port
	Index: tools/tools/netmap/bridge.8
	===================================================================
	--- tools/tools/netmap/bridge.8
	+++ tools/tools/netmap/bridge.8
	@@ -71,7 +71,8 @@
	.El
	.Sh SEE ALSO
	.Xr netmap 4 ,
	-.Xr pkt-gen 8
	+.Xr pkt-gen 8 ,
	+.Xr lb 8
	.Sh AUTHORS
	.An -nosplit
	.Nm
	Index: tools/tools/netmap/lb.8
	===================================================================
	--- tools/tools/netmap/lb.8
	+++ tools/tools/netmap/lb.8
	@@ -0,0 +1,125 @@
	+.\" Copyright (c) 2017 Corelight, Inc. and Universita` di Pisa
	+.\" All rights reserved.
	+.\"
	+.\" Redistribution and use in source and binary forms, with or without
	+.\" modification, are permitted provided that the following conditions
	+.\" are met:
	+.\" 1. Redistributions of source code must retain the above copyright
	+.\" notice, this list of conditions and the following disclaimer.
	+.\" 2. Redistributions in binary form must reproduce the above copyright
	+.\" notice, this list of conditions and the following disclaimer in the
	+.\" documentation and/or other materials provided with the distribution.
	+.\"
	+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
	+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
	+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
	+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
	+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
	+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
	+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
	+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
	+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
	+.\" SUCH DAMAGE.
	+.\"
	+.\" $FreeBSD$
	+.\"
	+.Dd October 28, 2018
	+.Dt LB 1
	+.Os
	+.Sh NAME
	+.Nm lb
	+.Nd netmap-based load balancer
	+.Sh SYNOPSIS
	+.Bk -words
	+.Bl -tag -width "lb"
	+.It Nm
	+.Op Fl i Ar port
	+.Op Fl p Ar pipe-group
	+.Op Fl B Ar extra-buffers
	+.Op Fl b Ar batch-size
	+.Op Fl w Ar wait-link
	+.El
	+.Ek
	+.Sh DESCRIPTION
	+.Nm
	+reads packets from an input netmap port and sends them to a number of netmap pipes,
	+trying to balance the packets received by each pipe.
	+Packets belonging to the
	+same connection will always be sent to the same pipe.
	+.Pp
	+Command line options are listed below.
	+.Bl -tag -width Ds
	+.It Fl i Ar port
	+Name of a netmap port.
	+It must be supplied exactly once to identify
	+the input port.
	+Any netmap port type (physical interface, VALE switch, pipe, monitor port...)
	+can be used.
	+.It Fl p Ar name:number \| number
	+Add a new pipe group of the given number of pipes.
	+The pipe group will receive all the packets read from the input port, balanced
	+among the available pipes.
	+The receiving ends of the pipes
	+will be called name}0 to name}number-1.
	+The name is optional and defaults to
	+the name of the input port (stripped down of any netmap operator).
	+If the name is omitted, also the colon can be omitted.
	+.Pp
	+This option can be supplied multiple times to define a sequence of pipe groups,
	+each group receiving all the packets in turn.
	+.Pp
	+If no -p option is given, a single group of two pipes with default name is assumed.
	+.Pp
	+It is allowed to use the same name for several groups.
	+The pipe numbering in each
	+group will start from were the previous identically-named group had left.
	+.It Fl B Ar extra-buffers
	+Try to reserve the given number of extra buffers.
	+Extra buffers are shared among
	+all pipes in all groups and work as an extension of the pipe rings.
	+If a pipe ring is full for whatever reason,
	+.Nm
	+tries to use extra buffers before dropping any packets directed to that pipe.
	+.Pp
	+If all extra buffers are busy, some are stolen from the pipe with the longest
	+backlog.
	+This gives preference to newer packets over old ones, and prevents a
	+stalled pipe to deplete the pool of extra buffers.
	+.It Fl b Ar batch-size
	+Maximum number of packets processed between two read operations from the input port.
	+Higher values of batch-size improve performance by amortizing read operations,
	+but increase the risk of filling up the port internal queues.
	+.It Fl w Ar wait-link
	+indicates the number of seconds to wait before transmitting.
	+It defaults to 2, and may be useful when talking to physical
	+ports to let link negotiation complete before starting transmission.
	+.El
	+.Sh LIMITATIONS
	+The group chaining assumes that the applications on the receiving end of the
	+pipes are read-only: they must not modify the buffers or the pipe ring slots
	+in any way.
	+.Pp
	+The group naming is currently implemented by creating a persistent VALE port
	+with the given name.
	+If
	+.Nm
	+does not exit cleanly the ports will not be removed.
	+Please use
	+.Xr vale-ctl 4
	+to remove any stale persistent VALE port.
	+.Sh SEE ALSO
	+.Xr netmap 4 ,
	+.Xr bridge 8 ,
	+.Xr pkt-gen 8
	+.Pp
	+.Pa http://info.iet.unipi.it/~luigi/netmap/
	+.Sh AUTHORS
	+.An -nosplit
	+.Nm
	+has been written by
	+.An Seth Hall
	+at Corelight, USA.
	+The facilities related to extra buffers and pipe groups have been added by
	+.An Giuseppe Lettieri
	+at University of Pisa, Italy, under contract by Corelight, USA.
	Index: tools/tools/netmap/lb.c
	===================================================================
	--- tools/tools/netmap/lb.c
	+++ tools/tools/netmap/lb.c
	@@ -0,0 +1,1027 @@
	+/*
	+ * Copyright (C) 2017 Corelight, Inc. and Universita` di Pisa. All rights reserved.
	+ *
	+ * Redistribution and use in source and binary forms, with or without
	+ * modification, are permitted provided that the following conditions
	+ * are met:
	+ * 1. Redistributions of source code must retain the above copyright
	+ * notice, this list of conditions and the following disclaimer.
	+ * 2. Redistributions in binary form must reproduce the above copyright
	+ * notice, this list of conditions and the following disclaimer in the
	+ * documentation and/or other materials provided with the distribution.
	+ *
	+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
	+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
	+ * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
	+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
	+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
	+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
	+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
	+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
	+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
	+ * SUCH DAMAGE.
	+ */
	+
	+#include <stdio.h>
	+#include <string.h>
	+#include <ctype.h>
	+#include <stdbool.h>
	+#include <inttypes.h>
	+#include <syslog.h>
	+
	+#define NETMAP_WITH_LIBS
	+#include <net/netmap_user.h>
	+#include <sys/poll.h>
	+
	+#include <netinet/in.h> /* htonl */
	+
	+#include <pthread.h>
	+
	+#include "pkt_hash.h"
	+#include "ctrs.h"
	+
	+
	+/*
	+ * use our version of header structs, rather than bringing in a ton
	+ * of platform specific ones
	+ */
	+#ifndef ETH_ALEN
	+#define ETH_ALEN 6
	+#endif
	+
	+struct compact_eth_hdr {
	+ unsigned char h_dest[ETH_ALEN];
	+ unsigned char h_source[ETH_ALEN];
	+ u_int16_t h_proto;
	+};
	+
	+struct compact_ip_hdr {
	+ u_int8_t ihl:4, version:4;
	+ u_int8_t tos;
	+ u_int16_t tot_len;
	+ u_int16_t id;
	+ u_int16_t frag_off;
	+ u_int8_t ttl;
	+ u_int8_t protocol;
	+ u_int16_t check;
	+ u_int32_t saddr;
	+ u_int32_t daddr;
	+};
	+
	+struct compact_ipv6_hdr {
	+ u_int8_t priority:4, version:4;
	+ u_int8_t flow_lbl[3];
	+ u_int16_t payload_len;
	+ u_int8_t nexthdr;
	+ u_int8_t hop_limit;
	+ struct in6_addr saddr;
	+ struct in6_addr daddr;
	+};
	+
	+#define MAX_IFNAMELEN 64
	+#define MAX_PORTNAMELEN (MAX_IFNAMELEN + 40)
	+#define DEF_OUT_PIPES 2
	+#define DEF_EXTRA_BUFS 0
	+#define DEF_BATCH 2048
	+#define DEF_WAIT_LINK 2
	+#define DEF_STATS_INT 600
	+#define BUF_REVOKE 100
	+#define STAT_MSG_MAXSIZE 1024
	+
	+struct {
	+ char ifname[MAX_IFNAMELEN];
	+ char base_name[MAX_IFNAMELEN];
	+ int netmap_fd;
	+ uint16_t output_rings;
	+ uint16_t num_groups;
	+ uint32_t extra_bufs;
	+ uint16_t batch;
	+ int stdout_interval;
	+ int syslog_interval;
	+ int wait_link;
	+ bool busy_wait;
	+} glob_arg;
	+
	+/*
	+ * the overflow queue is a circular queue of buffers
	+ */
	+struct overflow_queue {
	+ char name[MAX_IFNAMELEN + 16];
	+ struct netmap_slot *slots;
	+ uint32_t head;
	+ uint32_t tail;
	+ uint32_t n;
	+ uint32_t size;
	+};
	+
	+struct overflow_queue *freeq;
	+
	+static inline int
	+oq_full(struct overflow_queue *q)
	+{
	+ return q->n >= q->size;
	+}
	+
	+static inline int
	+oq_empty(struct overflow_queue *q)
	+{
	+ return q->n <= 0;
	+}
	+
	+static inline void
	+oq_enq(struct overflow_queue q, const struct netmap_slot s)
	+{
	+ if (unlikely(oq_full(q))) {
	+ D("%s: queue full!", q->name);
	+ abort();
	+ }
	+ q->slots[q->tail] = *s;
	+ q->n++;
	+ q->tail++;
	+ if (q->tail >= q->size)
	+ q->tail = 0;
	+}
	+
	+static inline struct netmap_slot
	+oq_deq(struct overflow_queue *q)
	+{
	+ struct netmap_slot s = q->slots[q->head];
	+ if (unlikely(oq_empty(q))) {
	+ D("%s: queue empty!", q->name);
	+ abort();
	+ }
	+ q->n--;
	+ q->head++;
	+ if (q->head >= q->size)
	+ q->head = 0;
	+ return s;
	+}
	+
	+static volatile int do_abort = 0;
	+
	+uint64_t dropped = 0;
	+uint64_t forwarded = 0;
	+uint64_t received_bytes = 0;
	+uint64_t received_pkts = 0;
	+uint64_t non_ip = 0;
	+uint32_t freeq_n = 0;
	+
	+struct port_des {
	+ char interface[MAX_PORTNAMELEN];
	+ struct my_ctrs ctr;
	+ unsigned int last_sync;
	+ uint32_t last_tail;
	+ struct overflow_queue *oq;
	+ struct nm_desc *nmd;
	+ struct netmap_ring *ring;
	+ struct group_des *group;
	+};
	+
	+struct port_des *ports;
	+
	+/* each group of pipes receives all the packets */
	+struct group_des {
	+ char pipename[MAX_IFNAMELEN];
	+ struct port_des *ports;
	+ int first_id;
	+ int nports;
	+ int last;
	+ int custom_port;
	+};
	+
	+struct group_des *groups;
	+
	+/* statistcs */
	+struct counters {
	+ struct timeval ts;
	+ struct my_ctrs *ctrs;
	+ uint64_t received_pkts;
	+ uint64_t received_bytes;
	+ uint64_t non_ip;
	+ uint32_t freeq_n;
	+ int status __attribute__((aligned(64)));
	+#define COUNTERS_EMPTY 0
	+#define COUNTERS_FULL 1
	+};
	+
	+struct counters counters_buf;
	+
	+static void *
	+print_stats(void *arg)
	+{
	+ int npipes = glob_arg.output_rings;
	+ int sys_int = 0;
	+ (void)arg;
	+ struct my_ctrs cur, prev;
	+ struct my_ctrs *pipe_prev;
	+
	+ pipe_prev = calloc(npipes, sizeof(struct my_ctrs));
	+ if (pipe_prev == NULL) {
	+ D("out of memory");
	+ exit(1);
	+ }
	+
	+ char stat_msg[STAT_MSG_MAXSIZE] = "";
	+
	+ memset(&prev, 0, sizeof(prev));
	+ while (!do_abort) {
	+ int j, dosyslog = 0, dostdout = 0, newdata;
	+ uint64_t pps = 0, dps = 0, bps = 0, dbps = 0, usec = 0;
	+ struct my_ctrs x;
	+
	+ counters_buf.status = COUNTERS_EMPTY;
	+ newdata = 0;
	+ memset(&cur, 0, sizeof(cur));
	+ sleep(1);
	+ if (counters_buf.status == COUNTERS_FULL) {
	+ __sync_synchronize();
	+ newdata = 1;
	+ cur.t = counters_buf.ts;
	+ if (prev.t.tv_sec \|\| prev.t.tv_usec) {
	+ usec = (cur.t.tv_sec - prev.t.tv_sec) * 1000000 +
	+ cur.t.tv_usec - prev.t.tv_usec;
	+ }
	+ }
	+
	+ ++sys_int;
	+ if (glob_arg.stdout_interval && sys_int % glob_arg.stdout_interval == 0)
	+ dostdout = 1;
	+ if (glob_arg.syslog_interval && sys_int % glob_arg.syslog_interval == 0)
	+ dosyslog = 1;
	+
	+ for (j = 0; j < npipes; ++j) {
	+ struct my_ctrs *c = &counters_buf.ctrs[j];
	+ cur.pkts += c->pkts;
	+ cur.drop += c->drop;
	+ cur.drop_bytes += c->drop_bytes;
	+ cur.bytes += c->bytes;
	+
	+ if (usec) {
	+ x.pkts = c->pkts - pipe_prev[j].pkts;
	+ x.drop = c->drop - pipe_prev[j].drop;
	+ x.bytes = c->bytes - pipe_prev[j].bytes;
	+ x.drop_bytes = c->drop_bytes - pipe_prev[j].drop_bytes;
	+ pps = (x.pkts*1000000 + usec/2) / usec;
	+ dps = (x.drop*1000000 + usec/2) / usec;
	+ bps = ((x.bytes1000000 + usec/2) / usec) 8;
	+ dbps = ((x.drop_bytes1000000 + usec/2) / usec) 8;
	+ }
	+ pipe_prev[j] = *c;
	+
	+ if ( (dosyslog \|\| dostdout) && newdata )
	+ snprintf(stat_msg, STAT_MSG_MAXSIZE,
	+ "{"
	+ "\"ts\":%.6f,"
	+ "\"interface\":\"%s\","
	+ "\"output_ring\":%" PRIu16 ","
	+ "\"packets_forwarded\":%" PRIu64 ","
	+ "\"packets_dropped\":%" PRIu64 ","
	+ "\"data_forward_rate_Mbps\":%.4f,"
	+ "\"data_drop_rate_Mbps\":%.4f,"
	+ "\"packet_forward_rate_kpps\":%.4f,"
	+ "\"packet_drop_rate_kpps\":%.4f,"
	+ "\"overflow_queue_size\":%" PRIu32
	+ "}", cur.t.tv_sec + (cur.t.tv_usec / 1000000.0),
	+ ports[j].interface,
	+ j,
	+ c->pkts,
	+ c->drop,
	+ (double)bps / 1024 / 1024,
	+ (double)dbps / 1024 / 1024,
	+ (double)pps / 1000,
	+ (double)dps / 1000,
	+ c->oq_n);
	+
	+ if (dosyslog && stat_msg[0])
	+ syslog(LOG_INFO, "%s", stat_msg);
	+ if (dostdout && stat_msg[0])
	+ printf("%s\n", stat_msg);
	+ }
	+ if (usec) {
	+ x.pkts = cur.pkts - prev.pkts;
	+ x.drop = cur.drop - prev.drop;
	+ x.bytes = cur.bytes - prev.bytes;
	+ x.drop_bytes = cur.drop_bytes - prev.drop_bytes;
	+ pps = (x.pkts*1000000 + usec/2) / usec;
	+ dps = (x.drop*1000000 + usec/2) / usec;
	+ bps = ((x.bytes1000000 + usec/2) / usec) 8;
	+ dbps = ((x.drop_bytes1000000 + usec/2) / usec) 8;
	+ }
	+
	+ if ( (dosyslog \|\| dostdout) && newdata )
	+ snprintf(stat_msg, STAT_MSG_MAXSIZE,
	+ "{"
	+ "\"ts\":%.6f,"
	+ "\"interface\":\"%s\","
	+ "\"output_ring\":null,"
	+ "\"packets_received\":%" PRIu64 ","
	+ "\"packets_forwarded\":%" PRIu64 ","
	+ "\"packets_dropped\":%" PRIu64 ","
	+ "\"non_ip_packets\":%" PRIu64 ","
	+ "\"data_forward_rate_Mbps\":%.4f,"
	+ "\"data_drop_rate_Mbps\":%.4f,"
	+ "\"packet_forward_rate_kpps\":%.4f,"
	+ "\"packet_drop_rate_kpps\":%.4f,"
	+ "\"free_buffer_slots\":%" PRIu32
	+ "}", cur.t.tv_sec + (cur.t.tv_usec / 1000000.0),
	+ glob_arg.ifname,
	+ received_pkts,
	+ cur.pkts,
	+ cur.drop,
	+ counters_buf.non_ip,
	+ (double)bps / 1024 / 1024,
	+ (double)dbps / 1024 / 1024,
	+ (double)pps / 1000,
	+ (double)dps / 1000,
	+ counters_buf.freeq_n);
	+
	+ if (dosyslog && stat_msg[0])
	+ syslog(LOG_INFO, "%s", stat_msg);
	+ if (dostdout && stat_msg[0])
	+ printf("%s\n", stat_msg);
	+
	+ prev = cur;
	+ }
	+
	+ free(pipe_prev);
	+
	+ return NULL;
	+}
	+
	+static void
	+free_buffers(void)
	+{
	+ int i, tot = 0;
	+ struct port_des *rxport = &ports[glob_arg.output_rings];
	+
	+ /* build a netmap free list with the buffers in all the overflow queues */
	+ for (i = 0; i < glob_arg.output_rings + 1; i++) {
	+ struct port_des *cp = &ports[i];
	+ struct overflow_queue *q = cp->oq;
	+
	+ if (!q)
	+ continue;
	+
	+ while (q->n) {
	+ struct netmap_slot s = oq_deq(q);
	+ uint32_t b = (uint32_t )NETMAP_BUF(cp->ring, s.buf_idx);
	+
	+ *b = rxport->nmd->nifp->ni_bufs_head;
	+ rxport->nmd->nifp->ni_bufs_head = s.buf_idx;
	+ tot++;
	+ }
	+ }
	+ D("added %d buffers to netmap free list", tot);
	+
	+ for (i = 0; i < glob_arg.output_rings + 1; ++i) {
	+ nm_close(ports[i].nmd);
	+ }
	+}
	+
	+
	+static void sigint_h(int sig)
	+{
	+ (void)sig; /* UNUSED */
	+ do_abort = 1;
	+ signal(SIGINT, SIG_DFL);
	+}
	+
	+void usage()
	+{
	+ printf("usage: lb [options]\n");
	+ printf("where options are:\n");
	+ printf(" -h view help text\n");
	+ printf(" -i iface interface name (required)\n");
	+ printf(" -p [prefix:]npipes add a new group of output pipes\n");
	+ printf(" -B nbufs number of extra buffers (default: %d)\n", DEF_EXTRA_BUFS);
	+ printf(" -b batch batch size (default: %d)\n", DEF_BATCH);
	+ printf(" -w seconds wait for link up (default: %d)\n", DEF_WAIT_LINK);
	+ printf(" -W enable busy waiting. this will run your CPU at 100%%\n");
	+ printf(" -s seconds seconds between syslog stats messages (default: 0)\n");
	+ printf(" -o seconds seconds between stdout stats messages (default: 0)\n");
	+ exit(0);
	+}
	+
	+static int
	+parse_pipes(char *spec)
	+{
	+ char *end = index(spec, ':');
	+ static int max_groups = 0;
	+ struct group_des *g;
	+
	+ ND("spec %s num_groups %d", spec, glob_arg.num_groups);
	+ if (max_groups < glob_arg.num_groups + 1) {
	+ size_t size = sizeof(g) (glob_arg.num_groups + 1);
	+ groups = realloc(groups, size);
	+ if (groups == NULL) {
	+ D("out of memory");
	+ return 1;
	+ }
	+ }
	+ g = &groups[glob_arg.num_groups];
	+ memset(g, 0, sizeof(*g));
	+
	+ if (end != NULL) {
	+ if (end - spec > MAX_IFNAMELEN - 8) {
	+ D("name '%s' too long", spec);
	+ return 1;
	+ }
	+ if (end == spec) {
	+ D("missing prefix before ':' in '%s'", spec);
	+ return 1;
	+ }
	+ strncpy(g->pipename, spec, end - spec);
	+ g->custom_port = 1;
	+ end++;
	+ } else {
	+ /* no prefix, this group will use the
	+ * name of the input port.
	+ * This will be set in init_groups(),
	+ * since here the input port may still
	+ * be uninitialized
	+ */
	+ end = spec;
	+ }
	+ if (*end == '\0') {
	+ g->nports = DEF_OUT_PIPES;
	+ } else {
	+ g->nports = atoi(end);
	+ if (g->nports < 1) {
	+ D("invalid number of pipes '%s' (must be at least 1)", end);
	+ return 1;
	+ }
	+ }
	+ glob_arg.output_rings += g->nports;
	+ glob_arg.num_groups++;
	+ return 0;
	+}
	+
	+/* complete the initialization of the groups data structure */
	+void init_groups(void)
	+{
	+ int i, j, t = 0;
	+ struct group_des *g = NULL;
	+ for (i = 0; i < glob_arg.num_groups; i++) {
	+ g = &groups[i];
	+ g->ports = &ports[t];
	+ for (j = 0; j < g->nports; j++)
	+ g->ports[j].group = g;
	+ t += g->nports;
	+ if (!g->custom_port)
	+ strcpy(g->pipename, glob_arg.base_name);
	+ for (j = 0; j < i; j++) {
	+ struct group_des *h = &groups[j];
	+ if (!strcmp(h->pipename, g->pipename))
	+ g->first_id += h->nports;
	+ }
	+ }
	+ g->last = 1;
	+}
	+
	+/* push the packet described by slot rs to the group g.
	+ * This may cause other buffers to be pushed down the
	+ * chain headed by g.
	+ * Return a free buffer.
	+ */
	+uint32_t forward_packet(struct group_des g, struct netmap_slot rs)
	+{
	+ uint32_t hash = rs->ptr;
	+ uint32_t output_port = hash % g->nports;
	+ struct port_des *port = &g->ports[output_port];
	+ struct netmap_ring *ring = port->ring;
	+ struct overflow_queue *q = port->oq;
	+
	+ /* Move the packet to the output pipe, unless there is
	+ * either no space left on the ring, or there is some
	+ * packet still in the overflow queue (since those must
	+ * take precedence over the new one)
	+ */
	+ if (ring->head != ring->tail && (q == NULL \|\| oq_empty(q))) {
	+ struct netmap_slot *ts = &ring->slot[ring->head];
	+ struct netmap_slot old_slot = *ts;
	+
	+ ts->buf_idx = rs->buf_idx;
	+ ts->len = rs->len;
	+ ts->flags \|= NS_BUF_CHANGED;
	+ ts->ptr = rs->ptr;
	+ ring->head = nm_ring_next(ring, ring->head);
	+ port->ctr.bytes += rs->len;
	+ port->ctr.pkts++;
	+ forwarded++;
	+ return old_slot.buf_idx;
	+ }
	+
	+ /* use the overflow queue, if available */
	+ if (q == NULL \|\| oq_full(q)) {
	+ /* no space left on the ring and no overflow queue
	+ * available: we are forced to drop the packet
	+ */
	+ dropped++;
	+ port->ctr.drop++;
	+ port->ctr.drop_bytes += rs->len;
	+ return rs->buf_idx;
	+ }
	+
	+ oq_enq(q, rs);
	+
	+ /*
	+ * we cannot continue down the chain and we need to
	+ * return a free buffer now. We take it from the free queue.
	+ */
	+ if (oq_empty(freeq)) {
	+ /* the free queue is empty. Revoke some buffers
	+ * from the longest overflow queue
	+ */
	+ uint32_t j;
	+ struct port_des *lp = &ports[0];
	+ uint32_t max = lp->oq->n;
	+
	+ /* let lp point to the port with the longest queue */
	+ for (j = 1; j < glob_arg.output_rings; j++) {
	+ struct port_des *cp = &ports[j];
	+ if (cp->oq->n > max) {
	+ lp = cp;
	+ max = cp->oq->n;
	+ }
	+ }
	+
	+ /* move the oldest BUF_REVOKE buffers from the
	+ * lp queue to the free queue
	+ */
	+ // XXX optimize this cycle
	+ for (j = 0; lp->oq->n && j < BUF_REVOKE; j++) {
	+ struct netmap_slot tmp = oq_deq(lp->oq);
	+
	+ dropped++;
	+ lp->ctr.drop++;
	+ lp->ctr.drop_bytes += tmp.len;
	+
	+ oq_enq(freeq, &tmp);
	+ }
	+
	+ ND(1, "revoked %d buffers from %s", j, lq->name);
	+ }
	+
	+ return oq_deq(freeq).buf_idx;
	+}
	+
	+int main(int argc, char **argv)
	+{
	+ int ch;
	+ uint32_t i;
	+ int rv;
	+ unsigned int iter = 0;
	+ int poll_timeout = 10; /* default */
	+
	+ glob_arg.ifname[0] = '\0';
	+ glob_arg.output_rings = 0;
	+ glob_arg.batch = DEF_BATCH;
	+ glob_arg.wait_link = DEF_WAIT_LINK;
	+ glob_arg.busy_wait = false;
	+ glob_arg.syslog_interval = 0;
	+ glob_arg.stdout_interval = 0;
	+
	+ while ( (ch = getopt(argc, argv, "hi:p:b:B:s:o:w:W")) != -1) {
	+ switch (ch) {
	+ case 'i':
	+ D("interface is %s", optarg);
	+ if (strlen(optarg) > MAX_IFNAMELEN - 8) {
	+ D("ifname too long %s", optarg);
	+ return 1;
	+ }
	+ if (strncmp(optarg, "netmap:", 7) && strncmp(optarg, "vale", 4)) {
	+ sprintf(glob_arg.ifname, "netmap:%s", optarg);
	+ } else {
	+ strcpy(glob_arg.ifname, optarg);
	+ }
	+ break;
	+
	+ case 'p':
	+ if (parse_pipes(optarg)) {
	+ usage();
	+ return 1;
	+ }
	+ break;
	+
	+ case 'B':
	+ glob_arg.extra_bufs = atoi(optarg);
	+ D("requested %d extra buffers", glob_arg.extra_bufs);
	+ break;
	+
	+ case 'b':
	+ glob_arg.batch = atoi(optarg);
	+ D("batch is %d", glob_arg.batch);
	+ break;
	+
	+ case 'w':
	+ glob_arg.wait_link = atoi(optarg);
	+ D("link wait for up time is %d", glob_arg.wait_link);
	+ break;
	+
	+ case 'W':
	+ glob_arg.busy_wait = true;
	+ break;
	+
	+ case 'o':
	+ glob_arg.stdout_interval = atoi(optarg);
	+ break;
	+
	+ case 's':
	+ glob_arg.syslog_interval = atoi(optarg);
	+ break;
	+
	+ case 'h':
	+ usage();
	+ return 0;
	+ break;
	+
	+ default:
	+ D("bad option %c %s", ch, optarg);
	+ usage();
	+ return 1;
	+ }
	+ }
	+
	+ if (glob_arg.ifname[0] == '\0') {
	+ D("missing interface name");
	+ usage();
	+ return 1;
	+ }
	+
	+ /* extract the base name */
	+ char *nscan = strncmp(glob_arg.ifname, "netmap:", 7) ?
	+ glob_arg.ifname : glob_arg.ifname + 7;
	+ strncpy(glob_arg.base_name, nscan, MAX_IFNAMELEN-1);
	+ for (nscan = glob_arg.base_name; nscan && !index("-^{}/@", *nscan); nscan++)
	+ ;
	+ *nscan = '\0';
	+
	+ if (glob_arg.num_groups == 0)
	+ parse_pipes("");
	+
	+ if (glob_arg.syslog_interval) {
	+ setlogmask(LOG_UPTO(LOG_INFO));
	+ openlog("lb", LOG_CONS \| LOG_PID \| LOG_NDELAY, LOG_LOCAL1);
	+ }
	+
	+ uint32_t npipes = glob_arg.output_rings;
	+
	+
	+ pthread_t stat_thread;
	+
	+ ports = calloc(npipes + 1, sizeof(struct port_des));
	+ if (!ports) {
	+ D("failed to allocate the stats array");
	+ return 1;
	+ }
	+ struct port_des *rxport = &ports[npipes];
	+ init_groups();
	+
	+ memset(&counters_buf, 0, sizeof(counters_buf));
	+ counters_buf.ctrs = calloc(npipes, sizeof(struct my_ctrs));
	+ if (!counters_buf.ctrs) {
	+ D("failed to allocate the counters snapshot buffer");
	+ return 1;
	+ }
	+
	+ /* we need base_req to specify pipes and extra bufs */
	+ struct nmreq base_req;
	+ memset(&base_req, 0, sizeof(base_req));
	+
	+ base_req.nr_arg1 = npipes;
	+ base_req.nr_arg3 = glob_arg.extra_bufs;
	+
	+ rxport->nmd = nm_open(glob_arg.ifname, &base_req, 0, NULL);
	+
	+ if (rxport->nmd == NULL) {
	+ D("cannot open %s", glob_arg.ifname);
	+ return (1);
	+ } else {
	+ D("successfully opened %s (tx rings: %u)", glob_arg.ifname,
	+ rxport->nmd->req.nr_tx_slots);
	+ }
	+
	+ uint32_t extra_bufs = rxport->nmd->req.nr_arg3;
	+ struct overflow_queue *oq = NULL;
	+ /* reference ring to access the buffers */
	+ rxport->ring = NETMAP_RXRING(rxport->nmd->nifp, 0);
	+
	+ if (!glob_arg.extra_bufs)
	+ goto run;
	+
	+ D("obtained %d extra buffers", extra_bufs);
	+ if (!extra_bufs)
	+ goto run;
	+
	+ /* one overflow queue for each output pipe, plus one for the
	+ * free extra buffers
	+ */
	+ oq = calloc(npipes + 1, sizeof(struct overflow_queue));
	+ if (!oq) {
	+ D("failed to allocated overflow queues descriptors");
	+ goto run;
	+ }
	+
	+ freeq = &oq[npipes];
	+ rxport->oq = freeq;
	+
	+ freeq->slots = calloc(extra_bufs, sizeof(struct netmap_slot));
	+ if (!freeq->slots) {
	+ D("failed to allocate the free list");
	+ }
	+ freeq->size = extra_bufs;
	+ snprintf(freeq->name, MAX_IFNAMELEN, "free queue");
	+
	+ /*
	+ * the list of buffers uses the first uint32_t in each buffer
	+ * as the index of the next buffer.
	+ */
	+ uint32_t scan;
	+ for (scan = rxport->nmd->nifp->ni_bufs_head;
	+ scan;
	+ scan = (uint32_t )NETMAP_BUF(rxport->ring, scan))
	+ {
	+ struct netmap_slot s;
	+ s.len = s.flags = 0;
	+ s.ptr = 0;
	+ s.buf_idx = scan;
	+ ND("freeq <- %d", s.buf_idx);
	+ oq_enq(freeq, &s);
	+ }
	+
	+
	+ if (freeq->n != extra_bufs) {
	+ D("something went wrong: netmap reported %d extra_bufs, but the free list contained %d",
	+ extra_bufs, freeq->n);
	+ return 1;
	+ }
	+ rxport->nmd->nifp->ni_bufs_head = 0;
	+
	+run:
	+ atexit(free_buffers);
	+
	+ int j, t = 0;
	+ for (j = 0; j < glob_arg.num_groups; j++) {
	+ struct group_des *g = &groups[j];
	+ int k;
	+ for (k = 0; k < g->nports; ++k) {
	+ struct port_des *p = &g->ports[k];
	+ snprintf(p->interface, MAX_PORTNAMELEN, "%s%s{%d/xT@%d",
	+ (strncmp(g->pipename, "vale", 4) ? "netmap:" : ""),
	+ g->pipename, g->first_id + k,
	+ rxport->nmd->req.nr_arg2);
	+ D("opening pipe named %s", p->interface);
	+
	+ p->nmd = nm_open(p->interface, NULL, 0, rxport->nmd);
	+
	+ if (p->nmd == NULL) {
	+ D("cannot open %s", p->interface);
	+ return (1);
	+ } else if (p->nmd->req.nr_arg2 != rxport->nmd->req.nr_arg2) {
	+ D("failed to open pipe #%d in zero-copy mode, "
	+ "please close any application that uses either pipe %s}%d, "
	+ "or %s{%d, and retry",
	+ k + 1, g->pipename, g->first_id + k, g->pipename, g->first_id + k);
	+ return (1);
	+ } else {
	+ D("successfully opened pipe #%d %s (tx slots: %d)",
	+ k + 1, p->interface, p->nmd->req.nr_tx_slots);
	+ p->ring = NETMAP_TXRING(p->nmd->nifp, 0);
	+ p->last_tail = nm_ring_next(p->ring, p->ring->tail);
	+ }
	+ D("zerocopy %s",
	+ (rxport->nmd->mem == p->nmd->mem) ? "enabled" : "disabled");
	+
	+ if (extra_bufs) {
	+ struct overflow_queue *q = &oq[t + k];
	+ q->slots = calloc(extra_bufs, sizeof(struct netmap_slot));
	+ if (!q->slots) {
	+ D("failed to allocate overflow queue for pipe %d", k);
	+ /* make all overflow queue management fail */
	+ extra_bufs = 0;
	+ }
	+ q->size = extra_bufs;
	+ snprintf(q->name, sizeof(q->name), "oq %s{%4d", g->pipename, k);
	+ p->oq = q;
	+ }
	+ }
	+ t += g->nports;
	+ }
	+
	+ if (glob_arg.extra_bufs && !extra_bufs) {
	+ if (oq) {
	+ for (i = 0; i < npipes + 1; i++) {
	+ free(oq[i].slots);
	+ oq[i].slots = NULL;
	+ }
	+ free(oq);
	+ oq = NULL;
	+ }
	+ D("* overflow queues disabled *");
	+ }
	+
	+ sleep(glob_arg.wait_link);
	+
	+ /* start stats thread after wait_link */
	+ if (pthread_create(&stat_thread, NULL, print_stats, NULL) == -1) {
	+ D("unable to create the stats thread: %s", strerror(errno));
	+ return 1;
	+ }
	+
	+ struct pollfd pollfd[npipes + 1];
	+ memset(&pollfd, 0, sizeof(pollfd));
	+ signal(SIGINT, sigint_h);
	+
	+ /* make sure we wake up as often as needed, even when there are no
	+ * packets coming in
	+ */
	+ if (glob_arg.syslog_interval > 0 && glob_arg.syslog_interval < poll_timeout)
	+ poll_timeout = glob_arg.syslog_interval;
	+ if (glob_arg.stdout_interval > 0 && glob_arg.stdout_interval < poll_timeout)
	+ poll_timeout = glob_arg.stdout_interval;
	+
	+ while (!do_abort) {
	+ u_int polli = 0;
	+ iter++;
	+
	+ for (i = 0; i < npipes; ++i) {
	+ struct netmap_ring *ring = ports[i].ring;
	+ int pending = nm_tx_pending(ring);
	+
	+ /* if there are packets pending, we want to be notified when
	+ * tail moves, so we let cur=tail
	+ */
	+ ring->cur = pending ? ring->tail : ring->head;
	+
	+ if (!glob_arg.busy_wait && !pending) {
	+ /* no need to poll, there are no packets pending */
	+ continue;
	+ }
	+ pollfd[polli].fd = ports[i].nmd->fd;
	+ pollfd[polli].events = POLLOUT;
	+ pollfd[polli].revents = 0;
	+ ++polli;
	+ }
	+
	+ pollfd[polli].fd = rxport->nmd->fd;
	+ pollfd[polli].events = POLLIN;
	+ pollfd[polli].revents = 0;
	+ ++polli;
	+
	+ //RD(5, "polling %d file descriptors", polli+1);
	+ rv = poll(pollfd, polli, poll_timeout);
	+ if (rv <= 0) {
	+ if (rv < 0 && errno != EAGAIN && errno != EINTR)
	+ RD(1, "poll error %s", strerror(errno));
	+ goto send_stats;
	+ }
	+
	+ /* if there are several groups, try pushing released packets from
	+ * upstream groups to the downstream ones.
	+ *
	+ * It is important to do this before returned slots are reused
	+ * for new transmissions. For the same reason, this must be
	+ * done starting from the last group going backwards.
	+ */
	+ for (i = glob_arg.num_groups - 1U; i > 0; i--) {
	+ struct group_des *g = &groups[i - 1];
	+ int j;
	+
	+ for (j = 0; j < g->nports; j++) {
	+ struct port_des *p = &g->ports[j];
	+ struct netmap_ring *ring = p->ring;
	+ uint32_t last = p->last_tail,
	+ stop = nm_ring_next(ring, ring->tail);
	+
	+ /* slight abuse of the API here: we touch the slot
	+ * pointed to by tail
	+ */
	+ for ( ; last != stop; last = nm_ring_next(ring, last)) {
	+ struct netmap_slot *rs = &ring->slot[last];
	+ // XXX less aggressive?
	+ rs->buf_idx = forward_packet(g + 1, rs);
	+ rs->flags \|= NS_BUF_CHANGED;
	+ rs->ptr = 0;
	+ }
	+ p->last_tail = last;
	+ }
	+ }
	+
	+
	+
	+ if (oq) {
	+ /* try to push packets from the overflow queues
	+ * to the corresponding pipes
	+ */
	+ for (i = 0; i < npipes; i++) {
	+ struct port_des *p = &ports[i];
	+ struct overflow_queue *q = p->oq;
	+ uint32_t j, lim;
	+ struct netmap_ring *ring;
	+ struct netmap_slot *slot;
	+
	+ if (oq_empty(q))
	+ continue;
	+ ring = p->ring;
	+ lim = nm_ring_space(ring);
	+ if (!lim)
	+ continue;
	+ if (q->n < lim)
	+ lim = q->n;
	+ for (j = 0; j < lim; j++) {
	+ struct netmap_slot s = oq_deq(q), tmp;
	+ tmp.ptr = 0;
	+ slot = &ring->slot[ring->head];
	+ tmp.buf_idx = slot->buf_idx;
	+ oq_enq(freeq, &tmp);
	+ *slot = s;
	+ slot->flags \|= NS_BUF_CHANGED;
	+ ring->head = nm_ring_next(ring, ring->head);
	+ }
	+ }
	+ }
	+
	+ /* push any new packets from the input port to the first group */
	+ int batch = 0;
	+ for (i = rxport->nmd->first_rx_ring; i <= rxport->nmd->last_rx_ring; i++) {
	+ struct netmap_ring *rxring = NETMAP_RXRING(rxport->nmd->nifp, i);
	+
	+ //D("prepare to scan rings");
	+ int next_cur = rxring->cur;
	+ struct netmap_slot *next_slot = &rxring->slot[next_cur];
	+ const char *next_buf = NETMAP_BUF(rxring, next_slot->buf_idx);
	+ while (!nm_ring_empty(rxring)) {
	+ struct netmap_slot *rs = next_slot;
	+ struct group_des *g = &groups[0];
	+ ++received_pkts;
	+ received_bytes += rs->len;
	+
	+ // CHOOSE THE CORRECT OUTPUT PIPE
	+ rs->ptr = pkt_hdr_hash((const unsigned char *)next_buf, 4, 'B');
	+ if (rs->ptr == 0) {
	+ non_ip++; // XXX ??
	+ }
	+ // prefetch the buffer for the next round
	+ next_cur = nm_ring_next(rxring, next_cur);
	+ next_slot = &rxring->slot[next_cur];
	+ next_buf = NETMAP_BUF(rxring, next_slot->buf_idx);
	+ __builtin_prefetch(next_buf);
	+ // 'B' is just a hashing seed
	+ rs->buf_idx = forward_packet(g, rs);
	+ rs->flags \|= NS_BUF_CHANGED;
	+ rxring->head = rxring->cur = next_cur;
	+
	+ batch++;
	+ if (unlikely(batch >= glob_arg.batch)) {
	+ ioctl(rxport->nmd->fd, NIOCRXSYNC, NULL);
	+ batch = 0;
	+ }
	+ ND(1,
	+ "Forwarded Packets: %"PRIu64" Dropped packets: %"PRIu64" Percent: %.2f",
	+ forwarded, dropped,
	+ ((float)dropped / (float)forwarded * 100));
	+ }
	+
	+ }
	+
	+ send_stats:
	+ if (counters_buf.status == COUNTERS_FULL)
	+ continue;
	+ /* take a new snapshot of the counters */
	+ gettimeofday(&counters_buf.ts, NULL);
	+ for (i = 0; i < npipes; i++) {
	+ struct my_ctrs *c = &counters_buf.ctrs[i];
	+ *c = ports[i].ctr;
	+ /*
	+ * If there are overflow queues, copy the number of them for each
	+ * port to the ctrs.oq_n variable for each port.
	+ */
	+ if (ports[i].oq != NULL)
	+ c->oq_n = ports[i].oq->n;
	+ }
	+ counters_buf.received_pkts = received_pkts;
	+ counters_buf.received_bytes = received_bytes;
	+ counters_buf.non_ip = non_ip;
	+ if (freeq != NULL)
	+ counters_buf.freeq_n = freeq->n;
	+ __sync_synchronize();
	+ counters_buf.status = COUNTERS_FULL;
	+ }
	+
	+ /*
	+ * If freeq exists, copy the number to the freeq_n member of the
	+ * message struct, otherwise set it to 0.
	+ */
	+ if (freeq != NULL) {
	+ freeq_n = freeq->n;
	+ } else {
	+ freeq_n = 0;
	+ }
	+
	+ pthread_join(stat_thread, NULL);
	+
	+ printf("%"PRIu64" packets forwarded. %"PRIu64" packets dropped. Total %"PRIu64"\n", forwarded,
	+ dropped, forwarded + dropped);
	+ return 0;
	+}
	Index: tools/tools/netmap/pkt_hash.h
	===================================================================
	--- tools/tools/netmap/pkt_hash.h
	+++ tools/tools/netmap/pkt_hash.h
	@@ -0,0 +1,78 @@
	+/*
	+ ** Copyright (c) 2015, Asim Jamshed, Robin Sommer, Seth Hall
	+ ** and the International Computer Science Institute. All rights reserved.
	+ **
	+ ** Redistribution and use in source and binary forms, with or without
	+ ** modification, are permitted provided that the following conditions are met:
	+ **
	+ ** (1) Redistributions of source code must retain the above copyright
	+ ** notice, this list of conditions and the following disclaimer.
	+ **
	+ ** (2) Redistributions in binary form must reproduce the above copyright
	+ ** notice, this list of conditions and the following disclaimer in the
	+ ** documentation and/or other materials provided with the distribution.
	+ **
	+ **
	+ ** THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
	+ ** AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+ ** IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
	+ ** ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
	+ ** LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
	+ ** CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
	+ ** SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
	+ ** INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
	+ ** CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
	+ ** ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
	+ ** POSSIBILITY OF SUCH DAMAGE.
	+ **/
	+#ifndef LB_PKT_HASH_H
	+#define LB_PKT_HASH_H
	+/---------------------------------------------------------------------/
	+/**
	+ ** Packet header hashing function utility - This file contains functions
	+ ** that parse the packet headers and computes hash functions based on
	+ ** the header fields. Please see pkt_hash.c for more details...
	+ **/
	+/---------------------------------------------------------------------/
	+/* for type def'n */
	+#include <stdint.h>
	+/---------------------------------------------------------------------/
	+#ifdef __GNUC__
	+#define likely(x) __builtin_expect(!!(x), 1)
	+#define unlikely(x) __builtin_expect(!!(x), 0)
	+#else
	+#define likely(x) (x)
	+#define unlikely(x) (x)
	+#endif
	+
	+#define HTONS(n) (((((unsigned short)(n) & 0xFF)) << 8) \| \
	+ (((unsigned short)(n) & 0xFF00) >> 8))
	+#define NTOHS(n) (((((unsigned short)(n) & 0xFF)) << 8) \| \
	+ (((unsigned short)(n) & 0xFF00) >> 8))
	+
	+#define HTONL(n) (((((unsigned long)(n) & 0xFF)) << 24) \| \
	+ ((((unsigned long)(n) & 0xFF00)) << 8) \| \
	+ ((((unsigned long)(n) & 0xFF0000)) >> 8) \| \
	+ ((((unsigned long)(n) & 0xFF000000)) >> 24))
	+
	+#define NTOHL(n) (((((unsigned long)(n) & 0xFF)) << 24) \| \
	+ ((((unsigned long)(n) & 0xFF00)) << 8) \| \
	+ ((((unsigned long)(n) & 0xFF0000)) >> 8) \| \
	+ ((((unsigned long)(n) & 0xFF000000)) >> 24))
	+/---------------------------------------------------------------------/
	+typedef struct vlanhdr {
	+ uint16_t pri_cfi_vlan;
	+ uint16_t proto;
	+} vlanhdr;
	+/---------------------------------------------------------------------/
	+/**
	+ ** Analyzes the packet header of computes a corresponding
	+ ** hash function.
	+ **/
	+uint32_t
	+pkt_hdr_hash(const unsigned char *buffer,
	+ uint8_t hash_split,
	+ uint8_t seed);
	+/---------------------------------------------------------------------/
	+#endif /* LB_PKT_HASH_H */
	+
	Index: tools/tools/netmap/pkt_hash.c
	===================================================================
	--- tools/tools/netmap/pkt_hash.c
	+++ tools/tools/netmap/pkt_hash.c
	@@ -0,0 +1,395 @@
	+/*
	+ ** Copyright (c) 2015, Asim Jamshed, Robin Sommer, Seth Hall
	+ ** and the International Computer Science Institute. All rights reserved.
	+ **
	+ ** Redistribution and use in source and binary forms, with or without
	+ ** modification, are permitted provided that the following conditions are met:
	+ **
	+ ** (1) Redistributions of source code must retain the above copyright
	+ ** notice, this list of conditions and the following disclaimer.
	+ **
	+ ** (2) Redistributions in binary form must reproduce the above copyright
	+ ** notice, this list of conditions and the following disclaimer in the
	+ ** documentation and/or other materials provided with the distribution.
	+ **
	+ **
	+ ** THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
	+ ** AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
	+ ** IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
	+ ** ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
	+ ** LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
	+ ** CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
	+ ** SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
	+ ** INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
	+ ** CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
	+ ** ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
	+ ** POSSIBILITY OF SUCH DAMAGE.
	+ **/
	+/* for func prototypes */
	+#include "pkt_hash.h"
	+
	+/* Make Linux headers choose BSD versions of some of the data structures */
	+#define __FAVOR_BSD
	+
	+/* for types */
	+#include <sys/types.h>
	+/* for [n/h]to[h/n][ls] */
	+#include <netinet/in.h>
	+/* iphdr */
	+#include <netinet/ip.h>
	+/* ipv6hdr */
	+#include <netinet/ip6.h>
	+/* tcphdr */
	+#include <netinet/tcp.h>
	+/* udphdr */
	+#include <netinet/udp.h>
	+/* eth hdr */
	+#include <net/ethernet.h>
	+/* for memset */
	+#include <string.h>
	+
	+#include <stdio.h>
	+#include <assert.h>
	+
	+//#include <libnet.h>
	+/---------------------------------------------------------------------/
	+/**
	+ * * The cache table is used to pick a nice seed for the hash value. It is
	+ * * built only once when sym_hash_fn is called for the very first time
	+ * */
	+static void
	+build_sym_key_cache(uint32_t *cache, int cache_len)
	+{
	+ static const uint8_t key[] = { 0x50, 0x6d };
	+
	+ uint32_t result = (((uint32_t)key[0]) << 24) \|
	+ (((uint32_t)key[1]) << 16) \|
	+ (((uint32_t)key[0]) << 8) \|
	+ ((uint32_t)key[1]);
	+
	+ uint32_t idx = 32;
	+ int i;
	+
	+ for (i = 0; i < cache_len; i++, idx++) {
	+ uint8_t shift = (idx % 8);
	+ uint32_t bit;
	+
	+ cache[i] = result;
	+ bit = ((key[(idx/8) & 1] << shift) & 0x80) ? 1 : 0;
	+ result = ((result << 1) \| bit);
	+ }
	+}
	+
	+static void
	+build_byte_cache(uint32_t byte_cache[256][4])
	+{
	+#define KEY_CACHE_LEN 96
	+ int i, j, k;
	+ uint32_t key_cache[KEY_CACHE_LEN];
	+
	+ build_sym_key_cache(key_cache, KEY_CACHE_LEN);
	+
	+ for (i = 0; i < 4; i++) {
	+ for (j = 0; j < 256; j++) {
	+ uint8_t b = j;
	+ byte_cache[j][i] = 0;
	+ for (k = 0; k < 8; k++) {
	+ if (b & 0x80)
	+ byte_cache[j][i] ^= key_cache[8 * i + k];
	+ b <<= 1U;
	+ }
	+ }
	+ }
	+}
	+
	+
	+/---------------------------------------------------------------------/
	+/**
	+ ** Computes symmetric hash based on the 4-tuple header data
	+ **/
	+static uint32_t
	+sym_hash_fn(uint32_t sip, uint32_t dip, uint16_t sp, uint32_t dp)
	+{
	+ uint32_t rc = 0;
	+ static int first_time = 1;
	+ static uint32_t byte_cache[256][4];
	+ uint8_t sip_b = (uint8_t )&sip,
	+ dip_b = (uint8_t )&dip,
	+ sp_b = (uint8_t )&sp,
	+ dp_b = (uint8_t )&dp;
	+
	+ if (first_time) {
	+ build_byte_cache(byte_cache);
	+ first_time = 0;
	+ }
	+
	+ rc = byte_cache[sip_b[3]][0] ^
	+ byte_cache[sip_b[2]][1] ^
	+ byte_cache[sip_b[1]][2] ^
	+ byte_cache[sip_b[0]][3] ^
	+ byte_cache[dip_b[3]][0] ^
	+ byte_cache[dip_b[2]][1] ^
	+ byte_cache[dip_b[1]][2] ^
	+ byte_cache[dip_b[0]][3] ^
	+ byte_cache[sp_b[1]][0] ^
	+ byte_cache[sp_b[0]][1] ^
	+ byte_cache[dp_b[1]][2] ^
	+ byte_cache[dp_b[0]][3];
	+
	+ return rc;
	+}
	+static uint32_t decode_gre_hash(const uint8_t *, uint8_t, uint8_t);
	+/---------------------------------------------------------------------/
	+/**
	+ ** Parser + hash function for the IPv4 packet
	+ **/
	+static uint32_t
	+decode_ip_n_hash(struct ip *iph, uint8_t hash_split, uint8_t seed)
	+{
	+ uint32_t rc = 0;
	+
	+ if (hash_split == 2) {
	+ rc = sym_hash_fn(ntohl(iph->ip_src.s_addr),
	+ ntohl(iph->ip_dst.s_addr),
	+ ntohs(0xFFFD) + seed,
	+ ntohs(0xFFFE) + seed);
	+ } else {
	+ struct tcphdr *tcph = NULL;
	+ struct udphdr *udph = NULL;
	+
	+ switch (iph->ip_p) {
	+ case IPPROTO_TCP:
	+ tcph = (struct tcphdr )((uint8_t )iph + (iph->ip_hl<<2));
	+ rc = sym_hash_fn(ntohl(iph->ip_src.s_addr),
	+ ntohl(iph->ip_dst.s_addr),
	+ ntohs(tcph->th_sport) + seed,
	+ ntohs(tcph->th_dport) + seed);
	+ break;
	+ case IPPROTO_UDP:
	+ udph = (struct udphdr )((uint8_t )iph + (iph->ip_hl<<2));
	+ rc = sym_hash_fn(ntohl(iph->ip_src.s_addr),
	+ ntohl(iph->ip_dst.s_addr),
	+ ntohs(udph->uh_sport) + seed,
	+ ntohs(udph->uh_dport) + seed);
	+ break;
	+ case IPPROTO_IPIP:
	+ /* tunneling */
	+ rc = decode_ip_n_hash((struct ip )((uint8_t )iph + (iph->ip_hl<<2)),
	+ hash_split, seed);
	+ break;
	+ case IPPROTO_GRE:
	+ rc = decode_gre_hash((uint8_t *)iph + (iph->ip_hl<<2),
	+ hash_split, seed);
	+ break;
	+ case IPPROTO_ICMP:
	+ case IPPROTO_ESP:
	+ case IPPROTO_PIM:
	+ case IPPROTO_IGMP:
	+ default:
	+ /*
	+ ** the hash strength (although weaker but) should still hold
	+ ** even with 2 fields
	+ **/
	+ rc = sym_hash_fn(ntohl(iph->ip_src.s_addr),
	+ ntohl(iph->ip_dst.s_addr),
	+ ntohs(0xFFFD) + seed,
	+ ntohs(0xFFFE) + seed);
	+ break;
	+ }
	+ }
	+ return rc;
	+}
	+/---------------------------------------------------------------------/
	+/**
	+ ** Parser + hash function for the IPv6 packet
	+ **/
	+static uint32_t
	+decode_ipv6_n_hash(struct ip6_hdr *ipv6h, uint8_t hash_split, uint8_t seed)
	+{
	+ uint32_t saddr, daddr;
	+ uint32_t rc = 0;
	+
	+ /* Get only the first 4 octets */
	+ saddr = ipv6h->ip6_src.s6_addr[0] \|
	+ (ipv6h->ip6_src.s6_addr[1] << 8) \|
	+ (ipv6h->ip6_src.s6_addr[2] << 16) \|
	+ (ipv6h->ip6_src.s6_addr[3] << 24);
	+ daddr = ipv6h->ip6_dst.s6_addr[0] \|
	+ (ipv6h->ip6_dst.s6_addr[1] << 8) \|
	+ (ipv6h->ip6_dst.s6_addr[2] << 16) \|
	+ (ipv6h->ip6_dst.s6_addr[3] << 24);
	+
	+ if (hash_split == 2) {
	+ rc = sym_hash_fn(ntohl(saddr),
	+ ntohl(daddr),
	+ ntohs(0xFFFD) + seed,
	+ ntohs(0xFFFE) + seed);
	+ } else {
	+ struct tcphdr *tcph = NULL;
	+ struct udphdr *udph = NULL;
	+
	+ switch(ntohs(ipv6h->ip6_ctlun.ip6_un1.ip6_un1_nxt)) {
	+ case IPPROTO_TCP:
	+ tcph = (struct tcphdr *)(ipv6h + 1);
	+ rc = sym_hash_fn(ntohl(saddr),
	+ ntohl(daddr),
	+ ntohs(tcph->th_sport) + seed,
	+ ntohs(tcph->th_dport) + seed);
	+ break;
	+ case IPPROTO_UDP:
	+ udph = (struct udphdr *)(ipv6h + 1);
	+ rc = sym_hash_fn(ntohl(saddr),
	+ ntohl(daddr),
	+ ntohs(udph->uh_sport) + seed,
	+ ntohs(udph->uh_dport) + seed);
	+ break;
	+ case IPPROTO_IPIP:
	+ /* tunneling */
	+ rc = decode_ip_n_hash((struct ip *)(ipv6h + 1),
	+ hash_split, seed);
	+ break;
	+ case IPPROTO_IPV6:
	+ /* tunneling */
	+ rc = decode_ipv6_n_hash((struct ip6_hdr *)(ipv6h + 1),
	+ hash_split, seed);
	+ break;
	+ case IPPROTO_GRE:
	+ rc = decode_gre_hash((uint8_t *)(ipv6h + 1), hash_split, seed);
	+ break;
	+ case IPPROTO_ICMP:
	+ case IPPROTO_ESP:
	+ case IPPROTO_PIM:
	+ case IPPROTO_IGMP:
	+ default:
	+ /*
	+ ** the hash strength (although weaker but) should still hold
	+ ** even with 2 fields
	+ **/
	+ rc = sym_hash_fn(ntohl(saddr),
	+ ntohl(daddr),
	+ ntohs(0xFFFD) + seed,
	+ ntohs(0xFFFE) + seed);
	+ }
	+ }
	+ return rc;
	+}
	+/---------------------------------------------------------------------/
	+/**
	+ * * A temp solution while hash for other protocols are filled...
	+ * * (See decode_vlan_n_hash & pkt_hdr_hash functions).
	+ * */
	+static uint32_t
	+decode_others_n_hash(struct ether_header *ethh, uint8_t seed)
	+{
	+ uint32_t saddr, daddr, rc;
	+
	+ saddr = ethh->ether_shost[5] \|
	+ (ethh->ether_shost[4] << 8) \|
	+ (ethh->ether_shost[3] << 16) \|
	+ (ethh->ether_shost[2] << 24);
	+ daddr = ethh->ether_dhost[5] \|
	+ (ethh->ether_dhost[4] << 8) \|
	+ (ethh->ether_dhost[3] << 16) \|
	+ (ethh->ether_dhost[2] << 24);
	+
	+ rc = sym_hash_fn(ntohl(saddr),
	+ ntohl(daddr),
	+ ntohs(0xFFFD) + seed,
	+ ntohs(0xFFFE) + seed);
	+
	+ return rc;
	+}
	+/---------------------------------------------------------------------/
	+/**
	+ ** Parser + hash function for VLAN packet
	+ **/
	+static inline uint32_t
	+decode_vlan_n_hash(struct ether_header *ethh, uint8_t hash_split, uint8_t seed)
	+{
	+ uint32_t rc = 0;
	+ struct vlanhdr vhdr = (struct vlanhdr )(ethh + 1);
	+
	+ switch (ntohs(vhdr->proto)) {
	+ case ETHERTYPE_IP:
	+ rc = decode_ip_n_hash((struct ip *)(vhdr + 1),
	+ hash_split, seed);
	+ break;
	+ case ETHERTYPE_IPV6:
	+ rc = decode_ipv6_n_hash((struct ip6_hdr *)(vhdr + 1),
	+ hash_split, seed);
	+ break;
	+ case ETHERTYPE_ARP:
	+ default:
	+ /* others */
	+ rc = decode_others_n_hash(ethh, seed);
	+ break;
	+ }
	+ return rc;
	+}
	+
	+/---------------------------------------------------------------------/
	+/**
	+ ** General parser + hash function...
	+ **/
	+uint32_t
	+pkt_hdr_hash(const unsigned char *buffer, uint8_t hash_split, uint8_t seed)
	+{
	+ uint32_t rc = 0;
	+ struct ether_header ethh = (struct ether_header )buffer;
	+
	+ switch (ntohs(ethh->ether_type)) {
	+ case ETHERTYPE_IP:
	+ rc = decode_ip_n_hash((struct ip *)(ethh + 1),
	+ hash_split, seed);
	+ break;
	+ case ETHERTYPE_IPV6:
	+ rc = decode_ipv6_n_hash((struct ip6_hdr *)(ethh + 1),
	+ hash_split, seed);
	+ break;
	+ case ETHERTYPE_VLAN:
	+ rc = decode_vlan_n_hash(ethh, hash_split, seed);
	+ break;
	+ case ETHERTYPE_ARP:
	+ default:
	+ /* others */
	+ rc = decode_others_n_hash(ethh, seed);
	+ break;
	+ }
	+
	+ return rc;
	+}
	+
	+/---------------------------------------------------------------------/
	+/**
	+ ** Parser + hash function for the GRE packet
	+ **/
	+static uint32_t
	+decode_gre_hash(const uint8_t *grehdr, uint8_t hash_split, uint8_t seed)
	+{
	+ uint32_t rc = 0;
	+ int len = 4 + 2 * (!!(grehdr & 1) + / Checksum */
	+ !!(grehdr & 2) + / Routing */
	+ !!(grehdr & 4) + / Key */
	+ !!(grehdr & 8)); / Sequence Number */
	+ uint16_t proto = ntohs((uint16_t )(void *)(grehdr + 2));
	+
	+ switch (proto) {
	+ case ETHERTYPE_IP:
	+ rc = decode_ip_n_hash((struct ip *)(grehdr + len),
	+ hash_split, seed);
	+ break;
	+ case ETHERTYPE_IPV6:
	+ rc = decode_ipv6_n_hash((struct ip6_hdr *)(grehdr + len),
	+ hash_split, seed);
	+ break;
	+ case 0x6558: /* Transparent Ethernet Bridging */
	+ rc = pkt_hdr_hash(grehdr + len, hash_split, seed);
	+ break;
	+ default:
	+ /* others */
	+ break;
	+ }
	+ return rc;
	+}
	+/---------------------------------------------------------------------/
	+

File Metadata

Mime Type: text/plain
Expires: Wed, Mar 4, 11:28 PM (14 h, 56 m)
Storage Engine: blob
Storage Format: Raw Data
Storage Handle: 29260773
Default Alt Text: D17735.id49720.diff (48 KB)

D17735.id49720.diffNo OneTemporaryActions

D17735.id49720.diffView Options

File Metadata

Event Timeline

D17735.id49720.diff
No OneTemporary
Actions

D17735.id49720.diff
View Options