Index: user/alc/PQ_LAUNDRY/sbin/etherswitchcfg/etherswitchcfg.8
===================================================================
--- user/alc/PQ_LAUNDRY/sbin/etherswitchcfg/etherswitchcfg.8	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sbin/etherswitchcfg/etherswitchcfg.8	(revision 303775)
@@ -1,192 +1,201 @@
 .\" Copyright (c) 2011-2012 Stefan Bethke.
 .\" All rights reserved.
 .\"
 .\" Redistribution and use in source and binary forms, with or without
 .\" modification, are permitted provided that the following conditions
 .\" are met:
 .\" 1. Redistributions of source code must retain the above copyright
 .\"    notice, this list of conditions and the following disclaimer.
 .\" 2. Redistributions in binary form must reproduce the above copyright
 .\"    notice, this list of conditions and the following disclaimer in the
 .\"    documentation and/or other materials provided with the distribution.
 .\"
 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 .\" SUCH DAMAGE.
 .\"
 .\" $FreeBSD$
 .\"
 .Dd September 20, 2013
 .Dt ETHERSWITCHCFG 8
 .Os
 .Sh NAME
 .Nm etherswitchcfg
 .Nd configure a built-in Ethernet switch
 .Sh SYNOPSIS
 .Nm
 .Op Fl "f control file"
-.Ar info
+.Cm info
 .Nm
 .Op Fl "f control file"
-.Ar config
+.Cm config
 .Ar command parameter
 .Nm
 .Op Fl "f control file"
-.Ar phy
+.Cm phy
 .Ar phy.register[=value]
 .Nm
 .Op Fl "f control file"
-.Ar port%d
+.Cm port%d
 .Ar [flags] command parameter
 .Nm
 .Op Fl "f control file"
-.Ar reg
+.Cm reg
 .Ar register[=value]
 .Nm
 .Op Fl "f control file"
-.Ar vlangroup%d
+.Cm vlangroup%d
 .Ar command parameter
 .Sh DESCRIPTION
 The
 .Nm
 utility is used to configure an Ethernet switch built into the system.
 .Nm
 accepts a number of options:
 .Pp
 .Bl -tag -width ".Fl f" -compact
 .It Fl "f control file"
 Specifies the
 .Xr etherswitch 4
 control file that represents the switch to be configured.
 It defaults to
 .Pa /dev/etherswitch0 .
 .It Fl m
 When reporting port information, also list available media options for
 that port.
 .It Fl v
 Produce more verbose output.
 Without this flag, lines that represent inactive or empty configuration
 options are omitted.
 .El
 .Ss config
 The config command provides access to global switch configuration
 parameters.
 It support the following commands:
 .Pp
-.Bl -tag -width ".Ar vlan_mode mode" -compact
-.It Ar vlan_mode mode
+.Bl -tag -width ".Cm vlan_mode mode" -compact
+.It Cm vlan_mode Ar mode
 Sets the switch VLAN mode (depends on the hardware).
 .El
 .Ss phy
 The phy command provides access to the registers of the PHYs attached
 to or integrated into the switch controller.
 PHY registers are specified as phy.register,
 where
 .Ar phy
 is usually the port number, and
 .Ar register
 is the register number.
 Both can be provided as decimal, octal or hexadecimal numbers in any of the formats
 understood by
 .Xr strtol 3 .
 To set the register value, use the form instance.register=value.
 .Ss port
 The port command selects one of the ports of the switch.
 It supports the following commands:
 .Pp
 .Bl -tag -width ".Ar pvid number" -compact
-.It Ar pvid number
+.It Cm pvid Ar number
 Sets the default port VID that is used to process incoming frames that are not tagged.
-.It Ar media mediaspec
+.It Cm media Ar mediaspec
 Specifies the physical media configuration to be configured for a port.
-.It Ar mediaopt mediaoption
+.It Cm mediaopt Ar mediaoption
 Specifies a list of media options for a port.
 See
 .Xr ifconfig 8
 for details on
-.Ar media
+.Cm media
 and
-.Ar mediaopt .
+.Cm mediaopt .
+.It Cm led Ar number style
+Sets the display style for a given LED.  Available styles are: 
+.Cm default 
+(usually flash on activity),
+.Cm on , 
+.Cm off , 
+and 
+.Cm blink .
+Not all switches will support all styles.
 .El
 .Pp
 And the following flags (please note that not all flags
 are supported by all switch drivers):
 .Pp
-.Bl -tag -width ".Ar addtag" -compact
-.It Ar addtag
+.Bl -tag -width ".Fl addtag" -compact
+.It Cm addtag
 Add VLAN tag to each packet sent by the port.
-.It Ar -addtag
+.It Fl addtag
 Disable the add VLAN tag option.
-.It Ar striptag
+.It Cm striptag
 Strip the VLAN tags from the packets sent by the port.
-.It Ar -striptag
+.It Fl striptag
 Disable the strip VLAN tag option.
-.It Ar firstlock
+.It Cm firstlock
 This options makes the switch port lock on the first MAC address it sees.
 After that, usually you need to reset the switch to learn different
 MAC addresses.
-.It Ar -firstlock
+.It Fl firstlock
 Disable the first lock option.
 Note that sometimes you need to reset the
 switch to really disable this option.
-.It Ar dropuntagged
+.It Cm dropuntagged
 Drop packets without a VLAN tag.
-.It Ar -dropuntagged
+.It Fl dropuntagged
 Disable the drop untagged packets option.
-.It Ar doubletag
+.It Cm doubletag
 Enable QinQ for the port.
-.It Ar -doubletag
+.It Fl doubletag
 Disable QinQ for the port.
-.It Ar ingress
+.It Cm ingress
 Enable the ingress filter on the port.
-.It Ar -ingress
+.It Fl ingress
 Disable the ingress filter.
 .El
 .Ss reg
 The reg command provides access to the registers of the switch controller.
 .Ss vlangroup
 The vlangroup command selects one of the VLAN groups for configuration.
 It supports the following commands:
 .Pp
-.Bl -tag -width ".Ar vlangroup" -compact
-.It Ar vlan VID
+.Bl -tag -width ".Cm members" -compact
+.It Cm vlan Ar VID
 Sets the VLAN ID (802.1q VID) for this VLAN group.
 Frames transmitted on tagged member ports of this group will be tagged
 with this VID.
 Incoming frames carrying this tag will be forwarded according to the
 configuration of this VLAN group.
-.It Ar members port,...
+.It Cm members Ar port,...
 Configures which ports are to be a member of this VLAN group.
 The port numbers are given as a comma-separated list.
 Each port can optionally be followed by
 .Dq t
 to indicate that frames on this port are tagged.
 .El
 .Sh FILES
 .Bl -tag -width /dev/etherswitch? -compact
 .It Pa /dev/etherswitch?
 Control file for the Ethernet switch driver.
 .El
 .Sh EXAMPLES
 Configure VLAN group 1 with a VID of 2 and make ports 0 and 5 its members
 while excluding all other ports.
 Port 5 will send and receive tagged frames while port 0 will be untagged.
 Incoming untagged frames on port 0 are assigned to vlangroup1.
 .Pp
 .Dl # etherswitchcfg vlangroup1 vlan 2 members 0,5t port0 pvid 2
 .Sh SEE ALSO
 .Xr etherswitch 4
 .Sh HISTORY
 .Nm
 first appeared in
 .Fx 10.0 .
 .Sh AUTHORS
 .An Stefan Bethke
Index: user/alc/PQ_LAUNDRY/sbin/etherswitchcfg/etherswitchcfg.c
===================================================================
--- user/alc/PQ_LAUNDRY/sbin/etherswitchcfg/etherswitchcfg.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sbin/etherswitchcfg/etherswitchcfg.c	(revision 303775)
@@ -1,710 +1,752 @@
 /*-
  * Copyright (c) 2011-2012 Stefan Bethke.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <ctype.h>
 #include <err.h>
 #include <errno.h>
 #include <fcntl.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <sysexits.h>
 #include <unistd.h>
 #include <sys/types.h>
 #include <sys/ioctl.h>
 #include <net/if.h>
 #include <net/if_media.h>
 #include <dev/etherswitch/etherswitch.h>
 
 int	get_media_subtype(int, const char *);
 int	get_media_mode(int, const char *);
 int	get_media_options(int, const char *);
 int	lookup_media_word(struct ifmedia_description *, const char *);
 void    print_media_word(int, int);
 void    print_media_word_ifconfig(int);
 
 /* some constants */
 #define IEEE802DOT1Q_VID_MAX	4094
 #define IFMEDIAREQ_NULISTENTRIES	256
 
 enum cmdmode {
 	MODE_NONE = 0,
 	MODE_PORT,
 	MODE_CONFIG,
 	MODE_VLANGROUP,
 	MODE_REGISTER,
 	MODE_PHYREG
 };
 
 struct cfg {
 	int					fd;
 	int					verbose;
 	int					mediatypes;
 	const char			*controlfile;
 	etherswitch_conf_t	conf;
 	etherswitch_info_t	info;
 	enum cmdmode		mode;
 	int					unit;
 };
 
 struct cmds {
 	enum cmdmode	mode;
 	const char		*name;
 	int				args;
 	void 			(*f)(struct cfg *, char *argv[]);
 };
 static struct cmds cmds[];
 
+/* Must match the ETHERSWITCH_PORT_LED_* enum order */
+static const char *ledstyles[] = { "default", "on", "off", "blink", NULL };
 
 /*
  * Print a value a la the %b format of the kernel's printf.
  * Stolen from ifconfig.c.
  */
 static void
 printb(const char *s, unsigned v, const char *bits)
 {
 	int i, any = 0;
 	char c;
 
 	if (bits && *bits == 8)
 		printf("%s=%o", s, v);
 	else
 		printf("%s=%x", s, v);
 	bits++;
 	if (bits) {
 		putchar('<');
 		while ((i = *bits++) != '\0') {
 			if (v & (1 << (i-1))) {
 				if (any)
 					putchar(',');
 				any = 1;
 				for (; (c = *bits) > 32; bits++)
 					putchar(c);
 			} else
 				for (; *bits > 32; bits++)
 					;
 		}
 		putchar('>');
 	}
 }
 
 static int
 read_register(struct cfg *cfg, int r)
 {
 	struct etherswitch_reg er;
 	
 	er.reg = r;
 	if (ioctl(cfg->fd, IOETHERSWITCHGETREG, &er) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHGETREG)");
 	return (er.val);
 }
 
 static void
 write_register(struct cfg *cfg, int r, int v)
 {
 	struct etherswitch_reg er;
 	
 	er.reg = r;
 	er.val = v;
 	if (ioctl(cfg->fd, IOETHERSWITCHSETREG, &er) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHSETREG)");
 }
 
 static int
 read_phyregister(struct cfg *cfg, int phy, int reg)
 {
 	struct etherswitch_phyreg er;
 	
 	er.phy = phy;
 	er.reg = reg;
 	if (ioctl(cfg->fd, IOETHERSWITCHGETPHYREG, &er) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHGETPHYREG)");
 	return (er.val);
 }
 
 static void
 write_phyregister(struct cfg *cfg, int phy, int reg, int val)
 {
 	struct etherswitch_phyreg er;
 	
 	er.phy = phy;
 	er.reg = reg;
 	er.val = val;
 	if (ioctl(cfg->fd, IOETHERSWITCHSETPHYREG, &er) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHSETPHYREG)");
 }
 
 static void
 set_port_vid(struct cfg *cfg, char *argv[])
 {
 	int v;
 	etherswitch_port_t p;
 	
 	v = strtol(argv[1], NULL, 0);
 	if (v < 0 || v > IEEE802DOT1Q_VID_MAX)
 		errx(EX_USAGE, "pvid must be between 0 and %d",
 		    IEEE802DOT1Q_VID_MAX);
 	bzero(&p, sizeof(p));
 	p.es_port = cfg->unit;
 	if (ioctl(cfg->fd, IOETHERSWITCHGETPORT, &p) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHGETPORT)");
 	p.es_pvid = v;
 	if (ioctl(cfg->fd, IOETHERSWITCHSETPORT, &p) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHSETPORT)");
 }
 
 static void
 set_port_flag(struct cfg *cfg, char *argv[])
 {
 	char *flag;
 	int n;
 	uint32_t f;
 	etherswitch_port_t p;
 
 	n = 0;
 	f = 0;
 	flag = argv[0];
 	if (strcmp(flag, "none") != 0) {
 		if (*flag == '-') {
 			n++;
 			flag++;
 		}
 		if (strcasecmp(flag, "striptag") == 0)
 			f = ETHERSWITCH_PORT_STRIPTAG;
 		else if (strcasecmp(flag, "addtag") == 0)
 			f = ETHERSWITCH_PORT_ADDTAG;
 		else if (strcasecmp(flag, "firstlock") == 0)
 			f = ETHERSWITCH_PORT_FIRSTLOCK;
 		else if (strcasecmp(flag, "dropuntagged") == 0)
 			f = ETHERSWITCH_PORT_DROPUNTAGGED;
 		else if (strcasecmp(flag, "doubletag") == 0)
 			f = ETHERSWITCH_PORT_DOUBLE_TAG;
 		else if (strcasecmp(flag, "ingress") == 0)
 			f = ETHERSWITCH_PORT_INGRESS;
 	}
 	bzero(&p, sizeof(p));
 	p.es_port = cfg->unit;
 	if (ioctl(cfg->fd, IOETHERSWITCHGETPORT, &p) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHGETPORT)");
 	if (n)
 		p.es_flags &= ~f;
 	else
 		p.es_flags |= f;
 	if (ioctl(cfg->fd, IOETHERSWITCHSETPORT, &p) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHSETPORT)");
 }
 
 static void
 set_port_media(struct cfg *cfg, char *argv[])
 {
 	etherswitch_port_t p;
 	int ifm_ulist[IFMEDIAREQ_NULISTENTRIES];
 	int subtype;
 	
 	bzero(&p, sizeof(p));
 	p.es_port = cfg->unit;
 	p.es_ifmr.ifm_ulist = ifm_ulist;
 	p.es_ifmr.ifm_count = IFMEDIAREQ_NULISTENTRIES;
 	if (ioctl(cfg->fd, IOETHERSWITCHGETPORT, &p) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHGETPORT)");
 	if (p.es_ifmr.ifm_count == 0)
 		return;
 	subtype = get_media_subtype(IFM_TYPE(ifm_ulist[0]), argv[1]);
 	p.es_ifr.ifr_media = (p.es_ifmr.ifm_current & IFM_IMASK) |
 	        IFM_TYPE(ifm_ulist[0]) | subtype;
 	if (ioctl(cfg->fd, IOETHERSWITCHSETPORT, &p) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHSETPORT)");
 }
 
 static void
 set_port_mediaopt(struct cfg *cfg, char *argv[])
 {
 	etherswitch_port_t p;
 	int ifm_ulist[IFMEDIAREQ_NULISTENTRIES];
 	int options;
 	
 	bzero(&p, sizeof(p));
 	p.es_port = cfg->unit;
 	p.es_ifmr.ifm_ulist = ifm_ulist;
 	p.es_ifmr.ifm_count = IFMEDIAREQ_NULISTENTRIES;
 	if (ioctl(cfg->fd, IOETHERSWITCHGETPORT, &p) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHGETPORT)");
 	options = get_media_options(IFM_TYPE(ifm_ulist[0]), argv[1]);
 	if (options == -1)
 		errx(EX_USAGE, "invalid media options \"%s\"", argv[1]);
 	if (options & IFM_HDX) {
 		p.es_ifr.ifr_media &= ~IFM_FDX;
 		options &= ~IFM_HDX;
 	}
 	p.es_ifr.ifr_media |= options;
 	if (ioctl(cfg->fd, IOETHERSWITCHSETPORT, &p) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHSETPORT)");
 }
 
 static void
+set_port_led(struct cfg *cfg, char *argv[])
+{
+	etherswitch_port_t p;
+	int led;
+	int i;
+	
+	bzero(&p, sizeof(p));
+	p.es_port = cfg->unit;
+	if (ioctl(cfg->fd, IOETHERSWITCHGETPORT, &p) != 0)
+		err(EX_OSERR, "ioctl(IOETHERSWITCHGETPORT)");
+
+	led = strtol(argv[1], NULL, 0);
+	if (led < 1 || led > p.es_nleds)
+		errx(EX_USAGE, "invalid led number %s; must be between 1 and %d",
+			argv[1], p.es_nleds);
+
+	led--;
+
+	for (i=0; ledstyles[i] != NULL; i++) {
+		if (strcmp(argv[2], ledstyles[i]) == 0) {
+			p.es_led[led] = i;
+			break;
+		}
+	} 
+	if (ledstyles[i] == NULL)
+		errx(EX_USAGE, "invalid led style \"%s\"", argv[2]);
+
+	if (ioctl(cfg->fd, IOETHERSWITCHSETPORT, &p) != 0)
+		err(EX_OSERR, "ioctl(IOETHERSWITCHSETPORT)");
+}
+
+static void
 set_vlangroup_vid(struct cfg *cfg, char *argv[])
 {
 	int v;
 	etherswitch_vlangroup_t vg;
 	
 	v = strtol(argv[1], NULL, 0);
 	if (v < 0 || v > IEEE802DOT1Q_VID_MAX)
 		errx(EX_USAGE, "vlan must be between 0 and %d", IEEE802DOT1Q_VID_MAX);
 	vg.es_vlangroup = cfg->unit;
 	if (ioctl(cfg->fd, IOETHERSWITCHGETVLANGROUP, &vg) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHGETVLANGROUP)");
 	vg.es_vid = v;
 	if (ioctl(cfg->fd, IOETHERSWITCHSETVLANGROUP, &vg) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHSETVLANGROUP)");
 }
 
 static void
 set_vlangroup_members(struct cfg *cfg, char *argv[])
 {
 	etherswitch_vlangroup_t vg;
 	int member, untagged;
 	char *c, *d;
 	int v;
 	
 	member = untagged = 0;
 	if (strcmp(argv[1], "none") != 0) {
 		for (c=argv[1]; *c; c=d) {
 			v = strtol(c, &d, 0);
 			if (d == c)
 				break;
 			if (v < 0 || v >= cfg->info.es_nports)
 				errx(EX_USAGE, "Member port must be between 0 and %d", cfg->info.es_nports-1);
 			if (d[0] == ',' || d[0] == '\0' ||
 				((d[0] == 't' || d[0] == 'T') && (d[1] == ',' || d[1] == '\0'))) {
 				if (d[0] == 't' || d[0] == 'T') {
 					untagged &= ~ETHERSWITCH_PORTMASK(v);
 					d++;
 				} else
 					untagged |= ETHERSWITCH_PORTMASK(v);
 				member |= ETHERSWITCH_PORTMASK(v);
 				d++;
 			} else
 				errx(EX_USAGE, "Invalid members specification \"%s\"", d);
 		}
 	}
 	vg.es_vlangroup = cfg->unit;
 	if (ioctl(cfg->fd, IOETHERSWITCHGETVLANGROUP, &vg) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHGETVLANGROUP)");
 	vg.es_member_ports = member;
 	vg.es_untagged_ports = untagged;
 	if (ioctl(cfg->fd, IOETHERSWITCHSETVLANGROUP, &vg) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHSETVLANGROUP)");
 }
 
 static int
 set_register(struct cfg *cfg, char *arg)
 {
 	int a, v;
 	char *c;
 	
 	a = strtol(arg, &c, 0);
 	if (c==arg)
 		return (1);
 	if (*c == '=') {
-		v = strtol(c+1, NULL, 0);
+		v = strtoul(c+1, NULL, 0);
 		write_register(cfg, a, v);
 	}
-	printf("\treg 0x%04x=0x%04x\n", a, read_register(cfg, a));
+	printf("\treg 0x%04x=0x%08x\n", a, read_register(cfg, a));
 	return (0);
 }
 
 static int
 set_phyregister(struct cfg *cfg, char *arg)
 {
 	int phy, reg, val;
 	char *c, *d;
 	
 	phy = strtol(arg, &c, 0);
 	if (c==arg)
 		return (1);
 	if (*c != '.')
 		return (1);
 	d = c+1;
 	reg = strtol(d, &c, 0);
 	if (d == c)
 		return (1);
 	if (*c == '=') {
-		val = strtol(c+1, NULL, 0);
+		val = strtoul(c+1, NULL, 0);
 		write_phyregister(cfg, phy, reg, val);
 	}
 	printf("\treg %d.0x%02x=0x%04x\n", phy, reg, read_phyregister(cfg, phy, reg));
 	return (0);
 }
 
 static void
 set_vlan_mode(struct cfg *cfg, char *argv[])
 {
 	etherswitch_conf_t conf;
 
 	bzero(&conf, sizeof(conf));
 	conf.cmd = ETHERSWITCH_CONF_VLAN_MODE;
 	if (strcasecmp(argv[1], "isl") == 0)
 		conf.vlan_mode = ETHERSWITCH_VLAN_ISL;
 	else if (strcasecmp(argv[1], "port") == 0)
 		conf.vlan_mode = ETHERSWITCH_VLAN_PORT;
 	else if (strcasecmp(argv[1], "dot1q") == 0)
 		conf.vlan_mode = ETHERSWITCH_VLAN_DOT1Q;
 	else if (strcasecmp(argv[1], "dot1q4k") == 0)
 		conf.vlan_mode = ETHERSWITCH_VLAN_DOT1Q_4K;
 	else if (strcasecmp(argv[1], "qinq") == 0)
 		conf.vlan_mode = ETHERSWITCH_VLAN_DOUBLE_TAG;
 	else
 		conf.vlan_mode = 0;
 	if (ioctl(cfg->fd, IOETHERSWITCHSETCONF, &conf) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHSETCONF)");
 }
 
 static void
 print_config(struct cfg *cfg)
 {
 	const char *c;
 
 	/* Get the device name. */
 	c = strrchr(cfg->controlfile, '/');
 	if (c != NULL)
 		c = c + 1;
 	else
 		c = cfg->controlfile;
 
 	/* Print VLAN mode. */
 	if (cfg->conf.cmd & ETHERSWITCH_CONF_VLAN_MODE) {
 		printf("%s: VLAN mode: ", c);
 		switch (cfg->conf.vlan_mode) {
 		case ETHERSWITCH_VLAN_ISL:
 			printf("ISL\n");
 			break;
 		case ETHERSWITCH_VLAN_PORT:
 			printf("PORT\n");
 			break;
 		case ETHERSWITCH_VLAN_DOT1Q:
 			printf("DOT1Q\n");
 			break;
 		case ETHERSWITCH_VLAN_DOT1Q_4K:
 			printf("DOT1Q4K\n");
 			break;
 		case ETHERSWITCH_VLAN_DOUBLE_TAG:
 			printf("QinQ\n");
 			break;
 		default:
 			printf("none\n");
 		}
 	}
 }
 
 static void
 print_port(struct cfg *cfg, int port)
 {
 	etherswitch_port_t p;
 	int ifm_ulist[IFMEDIAREQ_NULISTENTRIES];
 	int i;
 
 	bzero(&p, sizeof(p));
 	p.es_port = port;
 	p.es_ifmr.ifm_ulist = ifm_ulist;
 	p.es_ifmr.ifm_count = IFMEDIAREQ_NULISTENTRIES;
 	if (ioctl(cfg->fd, IOETHERSWITCHGETPORT, &p) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHGETPORT)");
 	printf("port%d:\n", port);
 	if (cfg->conf.vlan_mode == ETHERSWITCH_VLAN_DOT1Q)
 		printf("\tpvid: %d\n", p.es_pvid);
 	printb("\tflags", p.es_flags, ETHERSWITCH_PORT_FLAGS_BITS);
 	printf("\n");
+	if (p.es_nleds) {
+		printf("\tled: ");
+		for (i = 0; i < p.es_nleds; i++) {
+			printf("%d:%s%s", i+1, ledstyles[p.es_led[i]], (i==p.es_nleds-1)?"":" ");
+		}
+		printf("\n");
+	}
 	printf("\tmedia: ");
 	print_media_word(p.es_ifmr.ifm_current, 1);
 	if (p.es_ifmr.ifm_active != p.es_ifmr.ifm_current) {
 		putchar(' ');
 		putchar('(');
 		print_media_word(p.es_ifmr.ifm_active, 0);
 		putchar(')');
 	}
 	putchar('\n');
 	printf("\tstatus: %s\n", (p.es_ifmr.ifm_status & IFM_ACTIVE) != 0 ? "active" : "no carrier");
 	if (cfg->mediatypes) {
 		printf("\tsupported media:\n");
 		if (p.es_ifmr.ifm_count > IFMEDIAREQ_NULISTENTRIES)
 			p.es_ifmr.ifm_count = IFMEDIAREQ_NULISTENTRIES;
 		for (i=0; i<p.es_ifmr.ifm_count; i++) {
 			printf("\t\tmedia ");
 			print_media_word(ifm_ulist[i], 0);
 			putchar('\n');
 		}
 	}
 }
 
 static void
 print_vlangroup(struct cfg *cfg, int vlangroup)
 {
 	etherswitch_vlangroup_t vg;
 	int i, comma;
 	
 	vg.es_vlangroup = vlangroup;
 	if (ioctl(cfg->fd, IOETHERSWITCHGETVLANGROUP, &vg) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHGETVLANGROUP)");
 	if ((vg.es_vid & ETHERSWITCH_VID_VALID) == 0)
 		return;
 	vg.es_vid &= ETHERSWITCH_VID_MASK;
 	printf("vlangroup%d:\n", vlangroup);
 	if (cfg->conf.vlan_mode == ETHERSWITCH_VLAN_PORT)
 		printf("\tport: %d\n", vg.es_vid);
 	else
 		printf("\tvlan: %d\n", vg.es_vid);
 	printf("\tmembers ");
 	comma = 0;
 	if (vg.es_member_ports != 0)
 		for (i=0; i<cfg->info.es_nports; i++) {
 			if ((vg.es_member_ports & ETHERSWITCH_PORTMASK(i)) != 0) {
 				if (comma)
 					printf(",");
 				printf("%d", i);
 				if ((vg.es_untagged_ports & ETHERSWITCH_PORTMASK(i)) == 0)
 					printf("t");
 				comma = 1;
 			}
 		}
 	else
 		printf("none");
 	printf("\n");
 }
 
 static void
 print_info(struct cfg *cfg)
 {
 	const char *c;
 	int i;
 	
 	c = strrchr(cfg->controlfile, '/');
 	if (c != NULL)
 		c = c + 1;
 	else
 		c = cfg->controlfile;
 	if (cfg->verbose) {
 		printf("%s: %s with %d ports and %d VLAN groups\n", c,
 		    cfg->info.es_name, cfg->info.es_nports,
 		    cfg->info.es_nvlangroups);
 		printf("%s: ", c);
 		printb("VLAN capabilities",  cfg->info.es_vlan_caps,
 		    ETHERSWITCH_VLAN_CAPS_BITS);
 		printf("\n");
 	}
 	print_config(cfg);
 	for (i=0; i<cfg->info.es_nports; i++) {
 		print_port(cfg, i);
 	}
 	for (i=0; i<cfg->info.es_nvlangroups; i++) {
 		print_vlangroup(cfg, i);
 	}
 }
 
 static void
 usage(struct cfg *cfg __unused, char *argv[] __unused)
 {
 	fprintf(stderr, "usage: etherswitchctl\n");
 	fprintf(stderr, "\tetherswitchcfg [-f control file] info\n");
 	fprintf(stderr, "\tetherswitchcfg [-f control file] config "
 	    "command parameter\n");
 	fprintf(stderr, "\t\tconfig commands: vlan_mode\n");
 	fprintf(stderr, "\tetherswitchcfg [-f control file] phy "
 	    "phy.register[=value]\n");
 	fprintf(stderr, "\tetherswitchcfg [-f control file] portX "
 	    "[flags] command parameter\n");
-	fprintf(stderr, "\t\tport commands: pvid, media, mediaopt\n");
+	fprintf(stderr, "\t\tport commands: pvid, media, mediaopt, led\n");
 	fprintf(stderr, "\tetherswitchcfg [-f control file] reg "
 	    "register[=value]\n");
 	fprintf(stderr, "\tetherswitchcfg [-f control file] vlangroupX "
 	    "command parameter\n");
 	fprintf(stderr, "\t\tvlangroup commands: vlan, members\n");
 	exit(EX_USAGE);
 }
 
 static void
 newmode(struct cfg *cfg, enum cmdmode mode)
 {
 	if (mode == cfg->mode)
 		return;
 	switch (cfg->mode) {
 	case MODE_NONE:
 		break;
 	case MODE_CONFIG:
 		/*
 		 * Read the updated the configuration (it can be different
 		 * from the last time we read it).
 		 */
 		if (ioctl(cfg->fd, IOETHERSWITCHGETCONF, &cfg->conf) != 0)
 			err(EX_OSERR, "ioctl(IOETHERSWITCHGETCONF)");
 		print_config(cfg);
 		break;
 	case MODE_PORT:
 		print_port(cfg, cfg->unit);
 		break;
 	case MODE_VLANGROUP:
 		print_vlangroup(cfg, cfg->unit);
 		break;
 	case MODE_REGISTER:
 	case MODE_PHYREG:
 		break;
 	}
 	cfg->mode = mode;
 }
 
 int
 main(int argc, char *argv[])
 {
 	int ch;
 	struct cfg cfg;
 	int i;
 	
 	bzero(&cfg, sizeof(cfg));
 	cfg.controlfile = "/dev/etherswitch0";
 	while ((ch = getopt(argc, argv, "f:mv?")) != -1)
 		switch(ch) {
 		case 'f':
 			cfg.controlfile = optarg;
 			break;
 		case 'm':
 			cfg.mediatypes++;
 			break;
 		case 'v':
 			cfg.verbose++;
 			break;
 		case '?':
 			/* FALLTHROUGH */
 		default:
 			usage(&cfg, argv);
 		}
 	argc -= optind;
 	argv += optind;
 	cfg.fd = open(cfg.controlfile, O_RDONLY);
 	if (cfg.fd < 0)
 		err(EX_UNAVAILABLE, "Can't open control file: %s", cfg.controlfile);
 	if (ioctl(cfg.fd, IOETHERSWITCHGETINFO, &cfg.info) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHGETINFO)");
 	if (ioctl(cfg.fd, IOETHERSWITCHGETCONF, &cfg.conf) != 0)
 		err(EX_OSERR, "ioctl(IOETHERSWITCHGETCONF)");
 	if (argc == 0) {
 		print_info(&cfg);
 		return (0);
 	}
 	cfg.mode = MODE_NONE;
 	while (argc > 0) {
 		switch(cfg.mode) {
 		case MODE_NONE:
 			if (strcmp(argv[0], "info") == 0) {
 				print_info(&cfg);
 			} else if (sscanf(argv[0], "port%d", &cfg.unit) == 1) {
 				if (cfg.unit < 0 || cfg.unit >= cfg.info.es_nports)
 					errx(EX_USAGE, "port unit must be between 0 and %d", cfg.info.es_nports - 1);
 				newmode(&cfg, MODE_PORT);
 			} else if (sscanf(argv[0], "vlangroup%d", &cfg.unit) == 1) {
 				if (cfg.unit < 0 || cfg.unit >= cfg.info.es_nvlangroups)
 					errx(EX_USAGE,
 					    "vlangroup unit must be between 0 and %d",
 					    cfg.info.es_nvlangroups - 1);
 				newmode(&cfg, MODE_VLANGROUP);
 			} else if (strcmp(argv[0], "config") == 0) {
 				newmode(&cfg, MODE_CONFIG);
 			} else if (strcmp(argv[0], "phy") == 0) {
 				newmode(&cfg, MODE_PHYREG);
 			} else if (strcmp(argv[0], "reg") == 0) {
 				newmode(&cfg, MODE_REGISTER);
 			} else if (strcmp(argv[0], "help") == 0) {
 				usage(&cfg, argv);
 			} else {
 				errx(EX_USAGE, "Unknown command \"%s\"", argv[0]);
 			}
 			break;
 		case MODE_PORT:
 		case MODE_CONFIG:
 		case MODE_VLANGROUP:
 			for(i=0; cmds[i].name != NULL; i++) {
 				if (cfg.mode == cmds[i].mode && strcmp(argv[0], cmds[i].name) == 0) {
 					if (argc < (cmds[i].args + 1)) {
-						printf("%s needs an argument\n", cmds[i].name);
+						printf("%s needs %d argument%s\n", cmds[i].name, cmds[i].args, (cmds[i].args==1)?"":",");
 						break;
 					}
 					(cmds[i].f)(&cfg, argv);
 					argc -= cmds[i].args;
 					argv += cmds[i].args;
 					break;
 				}
 			}
 			if (cmds[i].name == NULL) {
 				newmode(&cfg, MODE_NONE);
 				continue;
 			}
 			break;
 		case MODE_REGISTER:
 			if (set_register(&cfg, argv[0]) != 0) {
 				newmode(&cfg, MODE_NONE);
 				continue;
 			}
 			break;
 		case MODE_PHYREG:
 			if (set_phyregister(&cfg, argv[0]) != 0) {
 				newmode(&cfg, MODE_NONE);
 				continue;
 			}
 			break;
 		}
 		argc--;
 		argv++;
 	}
 	/* switch back to command mode to print configuration for last command */
 	newmode(&cfg, MODE_NONE);
 	close(cfg.fd);
 	return (0);
 }
 
 static struct cmds cmds[] = {
 	{ MODE_PORT, "pvid", 1, set_port_vid },
 	{ MODE_PORT, "media", 1, set_port_media },
 	{ MODE_PORT, "mediaopt", 1, set_port_mediaopt },
+	{ MODE_PORT, "led", 2, set_port_led },
 	{ MODE_PORT, "addtag", 0, set_port_flag },
 	{ MODE_PORT, "-addtag", 0, set_port_flag },
 	{ MODE_PORT, "ingress", 0, set_port_flag },
 	{ MODE_PORT, "-ingress", 0, set_port_flag },
 	{ MODE_PORT, "striptag", 0, set_port_flag },
 	{ MODE_PORT, "-striptag", 0, set_port_flag },
 	{ MODE_PORT, "doubletag", 0, set_port_flag },
 	{ MODE_PORT, "-doubletag", 0, set_port_flag },
 	{ MODE_PORT, "firstlock", 0, set_port_flag },
 	{ MODE_PORT, "-firstlock", 0, set_port_flag },
 	{ MODE_PORT, "dropuntagged", 0, set_port_flag },
 	{ MODE_PORT, "-dropuntagged", 0, set_port_flag },
 	{ MODE_CONFIG, "vlan_mode", 1, set_vlan_mode },
 	{ MODE_VLANGROUP, "vlan", 1, set_vlangroup_vid },
 	{ MODE_VLANGROUP, "members", 1, set_vlangroup_members },
 	{ 0, NULL, 0, NULL }
 };
Index: user/alc/PQ_LAUNDRY/sbin/pfctl/parse.y
===================================================================
--- user/alc/PQ_LAUNDRY/sbin/pfctl/parse.y	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sbin/pfctl/parse.y	(revision 303775)
@@ -1,6259 +1,6269 @@
 /*	$OpenBSD: parse.y,v 1.554 2008/10/17 12:59:53 henning Exp $	*/
 
 /*
  * Copyright (c) 2001 Markus Friedl.  All rights reserved.
  * Copyright (c) 2001 Daniel Hartmeier.  All rights reserved.
  * Copyright (c) 2001 Theo de Raadt.  All rights reserved.
  * Copyright (c) 2002,2003 Henning Brauer. All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 %{
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/types.h>
 #include <sys/socket.h>
 #include <sys/stat.h>
 #ifdef __FreeBSD__
 #include <sys/sysctl.h>
 #endif
 #include <net/if.h>
 #include <netinet/in.h>
 #include <netinet/in_systm.h>
 #include <netinet/ip.h>
 #include <netinet/ip_icmp.h>
 #include <netinet/icmp6.h>
 #include <net/pfvar.h>
 #include <arpa/inet.h>
 #include <net/altq/altq.h>
 #include <net/altq/altq_cbq.h>
 #include <net/altq/altq_codel.h>
 #include <net/altq/altq_priq.h>
 #include <net/altq/altq_hfsc.h>
 #include <net/altq/altq_fairq.h>
 
 #include <stdio.h>
 #include <unistd.h>
 #include <stdlib.h>
 #include <netdb.h>
 #include <stdarg.h>
 #include <errno.h>
 #include <string.h>
 #include <ctype.h>
 #include <math.h>
 #include <err.h>
 #include <limits.h>
 #include <pwd.h>
 #include <grp.h>
 #include <md5.h>
 
 #include "pfctl_parser.h"
 #include "pfctl.h"
 
 static struct pfctl	*pf = NULL;
 static int		 debug = 0;
 static int		 rulestate = 0;
 static u_int16_t	 returnicmpdefault =
 			    (ICMP_UNREACH << 8) | ICMP_UNREACH_PORT;
 static u_int16_t	 returnicmp6default =
 			    (ICMP6_DST_UNREACH << 8) | ICMP6_DST_UNREACH_NOPORT;
 static int		 blockpolicy = PFRULE_DROP;
 static int		 require_order = 1;
 static int		 default_statelock;
 
-TAILQ_HEAD(files, file)		 files = TAILQ_HEAD_INITIALIZER(files);
+static TAILQ_HEAD(files, file)	 files = TAILQ_HEAD_INITIALIZER(files);
 static struct file {
 	TAILQ_ENTRY(file)	 entry;
 	FILE			*stream;
 	char			*name;
 	int			 lineno;
 	int			 errors;
 } *file;
 struct file	*pushfile(const char *, int);
 int		 popfile(void);
 int		 check_file_secrecy(int, const char *);
 int		 yyparse(void);
 int		 yylex(void);
 int		 yyerror(const char *, ...);
 int		 kw_cmp(const void *, const void *);
 int		 lookup(char *);
 int		 lgetc(int);
 int		 lungetc(int);
 int		 findeol(void);
 
-TAILQ_HEAD(symhead, sym)	 symhead = TAILQ_HEAD_INITIALIZER(symhead);
+static TAILQ_HEAD(symhead, sym)	 symhead = TAILQ_HEAD_INITIALIZER(symhead);
 struct sym {
 	TAILQ_ENTRY(sym)	 entry;
 	int			 used;
 	int			 persist;
 	char			*nam;
 	char			*val;
 };
 int		 symset(const char *, const char *, int);
 char		*symget(const char *);
 
 int		 atoul(char *, u_long *);
 
 enum {
 	PFCTL_STATE_NONE,
 	PFCTL_STATE_OPTION,
 	PFCTL_STATE_SCRUB,
 	PFCTL_STATE_QUEUE,
 	PFCTL_STATE_NAT,
 	PFCTL_STATE_FILTER
 };
 
 struct node_proto {
 	u_int8_t		 proto;
 	struct node_proto	*next;
 	struct node_proto	*tail;
 };
 
 struct node_port {
 	u_int16_t		 port[2];
 	u_int8_t		 op;
 	struct node_port	*next;
 	struct node_port	*tail;
 };
 
 struct node_uid {
 	uid_t			 uid[2];
 	u_int8_t		 op;
 	struct node_uid		*next;
 	struct node_uid		*tail;
 };
 
 struct node_gid {
 	gid_t			 gid[2];
 	u_int8_t		 op;
 	struct node_gid		*next;
 	struct node_gid		*tail;
 };
 
 struct node_icmp {
 	u_int8_t		 code;
 	u_int8_t		 type;
 	u_int8_t		 proto;
 	struct node_icmp	*next;
 	struct node_icmp	*tail;
 };
 
 enum	{ PF_STATE_OPT_MAX, PF_STATE_OPT_NOSYNC, PF_STATE_OPT_SRCTRACK,
 	    PF_STATE_OPT_MAX_SRC_STATES, PF_STATE_OPT_MAX_SRC_CONN,
 	    PF_STATE_OPT_MAX_SRC_CONN_RATE, PF_STATE_OPT_MAX_SRC_NODES,
 	    PF_STATE_OPT_OVERLOAD, PF_STATE_OPT_STATELOCK,
 	    PF_STATE_OPT_TIMEOUT, PF_STATE_OPT_SLOPPY, };
 
 enum	{ PF_SRCTRACK_NONE, PF_SRCTRACK, PF_SRCTRACK_GLOBAL, PF_SRCTRACK_RULE };
 
 struct node_state_opt {
 	int			 type;
 	union {
 		u_int32_t	 max_states;
 		u_int32_t	 max_src_states;
 		u_int32_t	 max_src_conn;
 		struct {
 			u_int32_t	limit;
 			u_int32_t	seconds;
 		}		 max_src_conn_rate;
 		struct {
 			u_int8_t	flush;
 			char		tblname[PF_TABLE_NAME_SIZE];
 		}		 overload;
 		u_int32_t	 max_src_nodes;
 		u_int8_t	 src_track;
 		u_int32_t	 statelock;
 		struct {
 			int		number;
 			u_int32_t	seconds;
 		}		 timeout;
 	}			 data;
 	struct node_state_opt	*next;
 	struct node_state_opt	*tail;
 };
 
 struct peer {
 	struct node_host	*host;
 	struct node_port	*port;
 };
 
-struct node_queue {
+static struct node_queue {
 	char			 queue[PF_QNAME_SIZE];
 	char			 parent[PF_QNAME_SIZE];
 	char			 ifname[IFNAMSIZ];
 	int			 scheduler;
 	struct node_queue	*next;
 	struct node_queue	*tail;
 }	*queues = NULL;
 
 struct node_qassign {
 	char		*qname;
 	char		*pqname;
 };
 
-struct filter_opts {
+static struct filter_opts {
 	int			 marker;
 #define FOM_FLAGS	0x01
 #define FOM_ICMP	0x02
 #define FOM_TOS		0x04
 #define FOM_KEEP	0x08
 #define FOM_SRCTRACK	0x10
 #define FOM_SETPRIO	0x0400
 #define FOM_PRIO	0x2000
 	struct node_uid		*uid;
 	struct node_gid		*gid;
 	struct {
 		u_int8_t	 b1;
 		u_int8_t	 b2;
 		u_int16_t	 w;
 		u_int16_t	 w2;
 	} flags;
 	struct node_icmp	*icmpspec;
 	u_int32_t		 tos;
 	u_int32_t		 prob;
 	struct {
 		int			 action;
 		struct node_state_opt	*options;
 	} keep;
 	int			 fragment;
 	int			 allowopts;
 	char			*label;
 	struct node_qassign	 queues;
 	char			*tag;
 	char			*match_tag;
 	u_int8_t		 match_tag_not;
 	u_int			 rtableid;
 	u_int8_t		 prio;
 	u_int8_t		 set_prio[2];
 	struct {
 		struct node_host	*addr;
 		u_int16_t		port;
 	}			 divert;
 } filter_opts;
 
-struct antispoof_opts {
+static struct antispoof_opts {
 	char			*label;
 	u_int			 rtableid;
 } antispoof_opts;
 
-struct scrub_opts {
+static struct scrub_opts {
 	int			 marker;
 #define SOM_MINTTL	0x01
 #define SOM_MAXMSS	0x02
 #define SOM_FRAGCACHE	0x04
 #define SOM_SETTOS	0x08
 	int			 nodf;
 	int			 minttl;
 	int			 maxmss;
 	int			 settos;
 	int			 fragcache;
 	int			 randomid;
 	int			 reassemble_tcp;
 	char			*match_tag;
 	u_int8_t		 match_tag_not;
 	u_int			 rtableid;
 } scrub_opts;
 
-struct queue_opts {
+static struct queue_opts {
 	int			marker;
 #define QOM_BWSPEC	0x01
 #define QOM_SCHEDULER	0x02
 #define QOM_PRIORITY	0x04
 #define QOM_TBRSIZE	0x08
 #define QOM_QLIMIT	0x10
 	struct node_queue_bw	queue_bwspec;
 	struct node_queue_opt	scheduler;
 	int			priority;
 	int			tbrsize;
 	int			qlimit;
 } queue_opts;
 
-struct table_opts {
+static struct table_opts {
 	int			flags;
 	int			init_addr;
 	struct node_tinithead	init_nodes;
 } table_opts;
 
-struct pool_opts {
+static struct pool_opts {
 	int			 marker;
 #define POM_TYPE		0x01
 #define POM_STICKYADDRESS	0x02
 	u_int8_t		 opts;
 	int			 type;
 	int			 staticport;
 	struct pf_poolhashkey	*key;
 
 } pool_opts;
 
-struct codel_opts	 codel_opts;
-struct node_hfsc_opts	 hfsc_opts;
-struct node_fairq_opts	 fairq_opts;
-struct node_state_opt	*keep_state_defaults = NULL;
+static struct codel_opts	 codel_opts;
+static struct node_hfsc_opts	 hfsc_opts;
+static struct node_fairq_opts	 fairq_opts;
+static struct node_state_opt	*keep_state_defaults = NULL;
 
 int		 disallow_table(struct node_host *, const char *);
 int		 disallow_urpf_failed(struct node_host *, const char *);
 int		 disallow_alias(struct node_host *, const char *);
 int		 rule_consistent(struct pf_rule *, int);
 int		 filter_consistent(struct pf_rule *, int);
 int		 nat_consistent(struct pf_rule *);
 int		 rdr_consistent(struct pf_rule *);
 int		 process_tabledef(char *, struct table_opts *);
 void		 expand_label_str(char *, size_t, const char *, const char *);
 void		 expand_label_if(const char *, char *, size_t, const char *);
 void		 expand_label_addr(const char *, char *, size_t, u_int8_t,
 		    struct node_host *);
 void		 expand_label_port(const char *, char *, size_t,
 		    struct node_port *);
 void		 expand_label_proto(const char *, char *, size_t, u_int8_t);
 void		 expand_label_nr(const char *, char *, size_t);
 void		 expand_label(char *, size_t, const char *, u_int8_t,
 		    struct node_host *, struct node_port *, struct node_host *,
 		    struct node_port *, u_int8_t);
 void		 expand_rule(struct pf_rule *, struct node_if *,
 		    struct node_host *, struct node_proto *, struct node_os *,
 		    struct node_host *, struct node_port *, struct node_host *,
 		    struct node_port *, struct node_uid *, struct node_gid *,
 		    struct node_icmp *, const char *);
 int		 expand_altq(struct pf_altq *, struct node_if *,
 		    struct node_queue *, struct node_queue_bw bwspec,
 		    struct node_queue_opt *);
 int		 expand_queue(struct pf_altq *, struct node_if *,
 		    struct node_queue *, struct node_queue_bw,
 		    struct node_queue_opt *);
 int		 expand_skip_interface(struct node_if *);
 
 int	 check_rulestate(int);
 int	 getservice(char *);
 int	 rule_label(struct pf_rule *, char *);
 int	 rt_tableid_max(void);
 
 void	 mv_rules(struct pf_ruleset *, struct pf_ruleset *);
 void	 decide_address_family(struct node_host *, sa_family_t *);
 void	 remove_invalid_hosts(struct node_host **, sa_family_t *);
 int	 invalid_redirect(struct node_host *, sa_family_t);
 u_int16_t parseicmpspec(char *, sa_family_t);
 
-TAILQ_HEAD(loadanchorshead, loadanchors)
+static TAILQ_HEAD(loadanchorshead, loadanchors)
     loadanchorshead = TAILQ_HEAD_INITIALIZER(loadanchorshead);
 
 struct loadanchors {
 	TAILQ_ENTRY(loadanchors)	 entries;
 	char				*anchorname;
 	char				*filename;
 };
 
 typedef struct {
 	union {
 		int64_t			 number;
 		double			 probability;
 		int			 i;
 		char			*string;
 		u_int			 rtableid;
 		struct {
 			u_int8_t	 b1;
 			u_int8_t	 b2;
 			u_int16_t	 w;
 			u_int16_t	 w2;
 		}			 b;
 		struct range {
 			int		 a;
 			int		 b;
 			int		 t;
 		}			 range;
 		struct node_if		*interface;
 		struct node_proto	*proto;
 		struct node_icmp	*icmp;
 		struct node_host	*host;
 		struct node_os		*os;
 		struct node_port	*port;
 		struct node_uid		*uid;
 		struct node_gid		*gid;
 		struct node_state_opt	*state_opt;
 		struct peer		 peer;
 		struct {
 			struct peer	 src, dst;
 			struct node_os	*src_os;
 		}			 fromto;
 		struct {
 			struct node_host	*host;
 			u_int8_t		 rt;
 			u_int8_t		 pool_opts;
 			sa_family_t		 af;
 			struct pf_poolhashkey	*key;
 		}			 route;
 		struct redirection {
 			struct node_host	*host;
 			struct range		 rport;
 		}			*redirection;
 		struct {
 			int			 action;
 			struct node_state_opt	*options;
 		}			 keep_state;
 		struct {
 			u_int8_t	 log;
 			u_int8_t	 logif;
 			u_int8_t	 quick;
 		}			 logquick;
 		struct {
 			int		 neg;
 			char		*name;
 		}			 tagged;
 		struct pf_poolhashkey	*hashkey;
 		struct node_queue	*queue;
 		struct node_queue_opt	 queue_options;
 		struct node_queue_bw	 queue_bwspec;
 		struct node_qassign	 qassign;
 		struct filter_opts	 filter_opts;
 		struct antispoof_opts	 antispoof_opts;
 		struct queue_opts	 queue_opts;
 		struct scrub_opts	 scrub_opts;
 		struct table_opts	 table_opts;
 		struct pool_opts	 pool_opts;
 		struct node_hfsc_opts	 hfsc_opts;
 		struct node_fairq_opts	 fairq_opts;
 		struct codel_opts	 codel_opts;
 	} v;
 	int lineno;
 } YYSTYPE;
 
 #define PPORT_RANGE	1
 #define PPORT_STAR	2
 int	parseport(char *, struct range *r, int);
 
 #define DYNIF_MULTIADDR(addr) ((addr).type == PF_ADDR_DYNIFTL && \
 	(!((addr).iflags & PFI_AFLAG_NOALIAS) ||		 \
 	!isdigit((addr).v.ifname[strlen((addr).v.ifname)-1])))
 
 %}
 
 %token	PASS BLOCK SCRUB RETURN IN OS OUT LOG QUICK ON FROM TO FLAGS
 %token	RETURNRST RETURNICMP RETURNICMP6 PROTO INET INET6 ALL ANY ICMPTYPE
 %token	ICMP6TYPE CODE KEEP MODULATE STATE PORT RDR NAT BINAT ARROW NODF
 %token	MINTTL ERROR ALLOWOPTS FASTROUTE FILENAME ROUTETO DUPTO REPLYTO NO LABEL
 %token	NOROUTE URPFFAILED FRAGMENT USER GROUP MAXMSS MAXIMUM TTL TOS DROP TABLE
 %token	REASSEMBLE FRAGDROP FRAGCROP ANCHOR NATANCHOR RDRANCHOR BINATANCHOR
 %token	SET OPTIMIZATION TIMEOUT LIMIT LOGINTERFACE BLOCKPOLICY RANDOMID
 %token	REQUIREORDER SYNPROXY FINGERPRINTS NOSYNC DEBUG SKIP HOSTID
 %token	ANTISPOOF FOR INCLUDE
 %token	BITMASK RANDOM SOURCEHASH ROUNDROBIN STATICPORT PROBABILITY
 %token	ALTQ CBQ CODEL PRIQ HFSC FAIRQ BANDWIDTH TBRSIZE LINKSHARE REALTIME
 %token	UPPERLIMIT QUEUE PRIORITY QLIMIT HOGS BUCKETS RTABLE TARGET INTERVAL
 %token	LOAD RULESET_OPTIMIZATION PRIO
 %token	STICKYADDRESS MAXSRCSTATES MAXSRCNODES SOURCETRACK GLOBAL RULE
 %token	MAXSRCCONN MAXSRCCONNRATE OVERLOAD FLUSH SLOPPY
 %token	TAGGED TAG IFBOUND FLOATING STATEPOLICY STATEDEFAULTS ROUTE SETTOS
 %token	DIVERTTO DIVERTREPLY
 %token	<v.string>		STRING
 %token	<v.number>		NUMBER
 %token	<v.i>			PORTBINARY
 %type	<v.interface>		interface if_list if_item_not if_item
 %type	<v.number>		number icmptype icmp6type uid gid
 %type	<v.number>		tos not yesno
 %type	<v.probability>		probability
 %type	<v.i>			no dir af fragcache optimizer
 %type	<v.i>			sourcetrack flush unaryop statelock
 %type	<v.b>			action nataction natpasslog scrubaction
 %type	<v.b>			flags flag blockspec prio
 %type	<v.range>		portplain portstar portrange
 %type	<v.hashkey>		hashkey
 %type	<v.proto>		proto proto_list proto_item
 %type	<v.number>		protoval
 %type	<v.icmp>		icmpspec
 %type	<v.icmp>		icmp_list icmp_item
 %type	<v.icmp>		icmp6_list icmp6_item
 %type	<v.number>		reticmpspec reticmp6spec
 %type	<v.fromto>		fromto
 %type	<v.peer>		ipportspec from to
 %type	<v.host>		ipspec toipspec xhost host dynaddr host_list
 %type	<v.host>		redir_host_list redirspec
 %type	<v.host>		route_host route_host_list routespec
 %type	<v.os>			os xos os_list
 %type	<v.port>		portspec port_list port_item
 %type	<v.uid>			uids uid_list uid_item
 %type	<v.gid>			gids gid_list gid_item
 %type	<v.route>		route
 %type	<v.redirection>		redirection redirpool
 %type	<v.string>		label stringall tag anchorname
 %type	<v.string>		string varstring numberstring
 %type	<v.keep_state>		keep
 %type	<v.state_opt>		state_opt_spec state_opt_list state_opt_item
 %type	<v.logquick>		logquick quick log logopts logopt
 %type	<v.interface>		antispoof_ifspc antispoof_iflst antispoof_if
 %type	<v.qassign>		qname
 %type	<v.queue>		qassign qassign_list qassign_item
 %type	<v.queue_options>	scheduler
 %type	<v.number>		cbqflags_list cbqflags_item
 %type	<v.number>		priqflags_list priqflags_item
 %type	<v.hfsc_opts>		hfscopts_list hfscopts_item hfsc_opts
 %type	<v.fairq_opts>		fairqopts_list fairqopts_item fairq_opts
 %type	<v.codel_opts>		codelopts_list codelopts_item codel_opts
 %type	<v.queue_bwspec>	bandwidth
 %type	<v.filter_opts>		filter_opts filter_opt filter_opts_l
 %type	<v.filter_opts>		filter_sets filter_set filter_sets_l
 %type	<v.antispoof_opts>	antispoof_opts antispoof_opt antispoof_opts_l
 %type	<v.queue_opts>		queue_opts queue_opt queue_opts_l
 %type	<v.scrub_opts>		scrub_opts scrub_opt scrub_opts_l
 %type	<v.table_opts>		table_opts table_opt table_opts_l
 %type	<v.pool_opts>		pool_opts pool_opt pool_opts_l
 %type	<v.tagged>		tagged
 %type	<v.rtableid>		rtable
 %%
 
 ruleset		: /* empty */
 		| ruleset include '\n'
 		| ruleset '\n'
 		| ruleset option '\n'
 		| ruleset scrubrule '\n'
 		| ruleset natrule '\n'
 		| ruleset binatrule '\n'
 		| ruleset pfrule '\n'
 		| ruleset anchorrule '\n'
 		| ruleset loadrule '\n'
 		| ruleset altqif '\n'
 		| ruleset queuespec '\n'
 		| ruleset varset '\n'
 		| ruleset antispoof '\n'
 		| ruleset tabledef '\n'
 		| '{' fakeanchor '}' '\n';
 		| ruleset error '\n'		{ file->errors++; }
 		;
 
 include		: INCLUDE STRING		{
 			struct file	*nfile;
 
 			if ((nfile = pushfile($2, 0)) == NULL) {
 				yyerror("failed to include file %s", $2);
 				free($2);
 				YYERROR;
 			}
 			free($2);
 
 			file = nfile;
 			lungetc('\n');
 		}
 		;
 
 /*
  * apply to previouslys specified rule: must be careful to note
  * what that is: pf or nat or binat or rdr
  */
 fakeanchor	: fakeanchor '\n'
 		| fakeanchor anchorrule '\n'
 		| fakeanchor binatrule '\n'
 		| fakeanchor natrule '\n'
 		| fakeanchor pfrule '\n'
 		| fakeanchor error '\n'
 		;
 
 optimizer	: string	{
 			if (!strcmp($1, "none"))
 				$$ = 0;
 			else if (!strcmp($1, "basic"))
 				$$ = PF_OPTIMIZE_BASIC;
 			else if (!strcmp($1, "profile"))
 				$$ = PF_OPTIMIZE_BASIC | PF_OPTIMIZE_PROFILE;
 			else {
 				yyerror("unknown ruleset-optimization %s", $1);
 				YYERROR;
 			}
 		}
 		;
 
 option		: SET OPTIMIZATION STRING		{
 			if (check_rulestate(PFCTL_STATE_OPTION)) {
 				free($3);
 				YYERROR;
 			}
 			if (pfctl_set_optimization(pf, $3) != 0) {
 				yyerror("unknown optimization %s", $3);
 				free($3);
 				YYERROR;
 			}
 			free($3);
 		}
 		| SET RULESET_OPTIMIZATION optimizer {
 			if (!(pf->opts & PF_OPT_OPTIMIZE)) {
 				pf->opts |= PF_OPT_OPTIMIZE;
 				pf->optimize = $3;
 			}
 		}
 		| SET TIMEOUT timeout_spec
 		| SET TIMEOUT '{' optnl timeout_list '}'
 		| SET LIMIT limit_spec
 		| SET LIMIT '{' optnl limit_list '}'
 		| SET LOGINTERFACE stringall		{
 			if (check_rulestate(PFCTL_STATE_OPTION)) {
 				free($3);
 				YYERROR;
 			}
 			if (pfctl_set_logif(pf, $3) != 0) {
 				yyerror("error setting loginterface %s", $3);
 				free($3);
 				YYERROR;
 			}
 			free($3);
 		}
 		| SET HOSTID number {
 			if ($3 == 0 || $3 > UINT_MAX) {
 				yyerror("hostid must be non-zero");
 				YYERROR;
 			}
 			if (pfctl_set_hostid(pf, $3) != 0) {
 				yyerror("error setting hostid %08x", $3);
 				YYERROR;
 			}
 		}
 		| SET BLOCKPOLICY DROP	{
 			if (pf->opts & PF_OPT_VERBOSE)
 				printf("set block-policy drop\n");
 			if (check_rulestate(PFCTL_STATE_OPTION))
 				YYERROR;
 			blockpolicy = PFRULE_DROP;
 		}
 		| SET BLOCKPOLICY RETURN {
 			if (pf->opts & PF_OPT_VERBOSE)
 				printf("set block-policy return\n");
 			if (check_rulestate(PFCTL_STATE_OPTION))
 				YYERROR;
 			blockpolicy = PFRULE_RETURN;
 		}
 		| SET REQUIREORDER yesno {
 			if (pf->opts & PF_OPT_VERBOSE)
 				printf("set require-order %s\n",
 				    $3 == 1 ? "yes" : "no");
 			require_order = $3;
 		}
 		| SET FINGERPRINTS STRING {
 			if (pf->opts & PF_OPT_VERBOSE)
 				printf("set fingerprints \"%s\"\n", $3);
 			if (check_rulestate(PFCTL_STATE_OPTION)) {
 				free($3);
 				YYERROR;
 			}
 			if (!pf->anchor->name[0]) {
 				if (pfctl_file_fingerprints(pf->dev,
 				    pf->opts, $3)) {
 					yyerror("error loading "
 					    "fingerprints %s", $3);
 					free($3);
 					YYERROR;
 				}
 			}
 			free($3);
 		}
 		| SET STATEPOLICY statelock {
 			if (pf->opts & PF_OPT_VERBOSE)
 				switch ($3) {
 				case 0:
 					printf("set state-policy floating\n");
 					break;
 				case PFRULE_IFBOUND:
 					printf("set state-policy if-bound\n");
 					break;
 				}
 			default_statelock = $3;
 		}
 		| SET DEBUG STRING {
 			if (check_rulestate(PFCTL_STATE_OPTION)) {
 				free($3);
 				YYERROR;
 			}
 			if (pfctl_set_debug(pf, $3) != 0) {
 				yyerror("error setting debuglevel %s", $3);
 				free($3);
 				YYERROR;
 			}
 			free($3);
 		}
 		| SET SKIP interface {
 			if (expand_skip_interface($3) != 0) {
 				yyerror("error setting skip interface(s)");
 				YYERROR;
 			}
 		}
 		| SET STATEDEFAULTS state_opt_list {
 			if (keep_state_defaults != NULL) {
 				yyerror("cannot redefine state-defaults");
 				YYERROR;
 			}
 			keep_state_defaults = $3;
 		}
 		;
 
 stringall	: STRING	{ $$ = $1; }
 		| ALL		{
 			if (($$ = strdup("all")) == NULL) {
 				err(1, "stringall: strdup");
 			}
 		}
 		;
 
 string		: STRING string				{
 			if (asprintf(&$$, "%s %s", $1, $2) == -1)
 				err(1, "string: asprintf");
 			free($1);
 			free($2);
 		}
 		| STRING
 		;
 
 varstring	: numberstring varstring 		{
 			if (asprintf(&$$, "%s %s", $1, $2) == -1)
 				err(1, "string: asprintf");
 			free($1);
 			free($2);
 		}
 		| numberstring
 		;
 
 numberstring	: NUMBER				{
 			char	*s;
 			if (asprintf(&s, "%lld", (long long)$1) == -1) {
 				yyerror("string: asprintf");
 				YYERROR;
 			}
 			$$ = s;
 		}
 		| STRING
 		;
 
 varset		: STRING '=' varstring	{
 			if (pf->opts & PF_OPT_VERBOSE)
 				printf("%s = \"%s\"\n", $1, $3);
 			if (symset($1, $3, 0) == -1)
 				err(1, "cannot store variable %s", $1);
 			free($1);
 			free($3);
 		}
 		;
 
 anchorname	: STRING			{ $$ = $1; }
 		| /* empty */			{ $$ = NULL; }
 		;
 
 pfa_anchorlist	: /* empty */
 		| pfa_anchorlist '\n'
 		| pfa_anchorlist pfrule '\n'
 		| pfa_anchorlist anchorrule '\n'
 		;
 
 pfa_anchor	: '{'
 		{
 			char ta[PF_ANCHOR_NAME_SIZE];
 			struct pf_ruleset *rs;
 
 			/* steping into a brace anchor */
 			pf->asd++;
 			pf->bn++;
 			pf->brace = 1;
 
 			/* create a holding ruleset in the root */
 			snprintf(ta, PF_ANCHOR_NAME_SIZE, "_%d", pf->bn);
 			rs = pf_find_or_create_ruleset(ta);
 			if (rs == NULL)
 				err(1, "pfa_anchor: pf_find_or_create_ruleset");
 			pf->astack[pf->asd] = rs->anchor;
 			pf->anchor = rs->anchor;
 		} '\n' pfa_anchorlist '}'
 		{
 			pf->alast = pf->anchor;
 			pf->asd--;
 			pf->anchor = pf->astack[pf->asd];
 		}
 		| /* empty */
 		;
 
 anchorrule	: ANCHOR anchorname dir quick interface af proto fromto
 		    filter_opts pfa_anchor
 		{
 			struct pf_rule	r;
 			struct node_proto	*proto;
 
 			if (check_rulestate(PFCTL_STATE_FILTER)) {
 				if ($2)
 					free($2);
 				YYERROR;
 			}
 
 			if ($2 && ($2[0] == '_' || strstr($2, "/_") != NULL)) {
 				free($2);
 				yyerror("anchor names beginning with '_' "
 				    "are reserved for internal use");
 				YYERROR;
 			}
 
 			memset(&r, 0, sizeof(r));
 			if (pf->astack[pf->asd + 1]) {
 				/* move inline rules into relative location */
 				pf_anchor_setup(&r,
 				    &pf->astack[pf->asd]->ruleset,
 				    $2 ? $2 : pf->alast->name);
 		
 				if (r.anchor == NULL)
 					err(1, "anchorrule: unable to "
 					    "create ruleset");
 
 				if (pf->alast != r.anchor) {
 					if (r.anchor->match) {
 						yyerror("inline anchor '%s' "
 						    "already exists",
 						    r.anchor->name);
 						YYERROR;
 					}
 					mv_rules(&pf->alast->ruleset,
 					    &r.anchor->ruleset);
 				}
 				pf_remove_if_empty_ruleset(&pf->alast->ruleset);
 				pf->alast = r.anchor;
 			} else {
 				if (!$2) {
 					yyerror("anchors without explicit "
 					    "rules must specify a name");
 					YYERROR;
 				}
 			}
 			r.direction = $3;
 			r.quick = $4.quick;
 			r.af = $6;
 			r.prob = $9.prob;
 			r.rtableid = $9.rtableid;
 
 			if ($9.tag)
 				if (strlcpy(r.tagname, $9.tag,
 				    PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) {
 					yyerror("tag too long, max %u chars",
 					    PF_TAG_NAME_SIZE - 1);
 					YYERROR;
 				}
 			if ($9.match_tag)
 				if (strlcpy(r.match_tagname, $9.match_tag,
 				    PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) {
 					yyerror("tag too long, max %u chars",
 					    PF_TAG_NAME_SIZE - 1);
 					YYERROR;
 				}
 			r.match_tag_not = $9.match_tag_not;
 			if (rule_label(&r, $9.label))
 				YYERROR;
 			free($9.label);
 			r.flags = $9.flags.b1;
 			r.flagset = $9.flags.b2;
 			if (($9.flags.b1 & $9.flags.b2) != $9.flags.b1) {
 				yyerror("flags always false");
 				YYERROR;
 			}
 			if ($9.flags.b1 || $9.flags.b2 || $8.src_os) {
 				for (proto = $7; proto != NULL &&
 				    proto->proto != IPPROTO_TCP;
 				    proto = proto->next)
 					;	/* nothing */
 				if (proto == NULL && $7 != NULL) {
 					if ($9.flags.b1 || $9.flags.b2)
 						yyerror(
 						    "flags only apply to tcp");
 					if ($8.src_os)
 						yyerror(
 						    "OS fingerprinting only "
 						    "applies to tcp");
 					YYERROR;
 				}
 			}
 
 			r.tos = $9.tos;
 
 			if ($9.keep.action) {
 				yyerror("cannot specify state handling "
 				    "on anchors");
 				YYERROR;
 			}
 
 			if ($9.match_tag)
 				if (strlcpy(r.match_tagname, $9.match_tag,
 				    PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) {
 					yyerror("tag too long, max %u chars",
 					    PF_TAG_NAME_SIZE - 1);
 					YYERROR;
 				}
 			r.match_tag_not = $9.match_tag_not;
 			if ($9.marker & FOM_PRIO) {
 				if ($9.prio == 0)
 					r.prio = PF_PRIO_ZERO;
 				else
 					r.prio = $9.prio;
 			}
 			if ($9.marker & FOM_SETPRIO) {
 				r.set_prio[0] = $9.set_prio[0];
 				r.set_prio[1] = $9.set_prio[1];
 				r.scrub_flags |= PFSTATE_SETPRIO;
 			}
 
 			decide_address_family($8.src.host, &r.af);
 			decide_address_family($8.dst.host, &r.af);
 
 			expand_rule(&r, $5, NULL, $7, $8.src_os,
 			    $8.src.host, $8.src.port, $8.dst.host, $8.dst.port,
 			    $9.uid, $9.gid, $9.icmpspec,
 			    pf->astack[pf->asd + 1] ? pf->alast->name : $2);
 			free($2);
 			pf->astack[pf->asd + 1] = NULL;
 		}
 		| NATANCHOR string interface af proto fromto rtable {
 			struct pf_rule	r;
 
 			if (check_rulestate(PFCTL_STATE_NAT)) {
 				free($2);
 				YYERROR;
 			}
 
 			memset(&r, 0, sizeof(r));
 			r.action = PF_NAT;
 			r.af = $4;
 			r.rtableid = $7;
 
 			decide_address_family($6.src.host, &r.af);
 			decide_address_family($6.dst.host, &r.af);
 
 			expand_rule(&r, $3, NULL, $5, $6.src_os,
 			    $6.src.host, $6.src.port, $6.dst.host, $6.dst.port,
 			    0, 0, 0, $2);
 			free($2);
 		}
 		| RDRANCHOR string interface af proto fromto rtable {
 			struct pf_rule	r;
 
 			if (check_rulestate(PFCTL_STATE_NAT)) {
 				free($2);
 				YYERROR;
 			}
 
 			memset(&r, 0, sizeof(r));
 			r.action = PF_RDR;
 			r.af = $4;
 			r.rtableid = $7;
 
 			decide_address_family($6.src.host, &r.af);
 			decide_address_family($6.dst.host, &r.af);
 
 			if ($6.src.port != NULL) {
 				yyerror("source port parameter not supported"
 				    " in rdr-anchor");
 				YYERROR;
 			}
 			if ($6.dst.port != NULL) {
 				if ($6.dst.port->next != NULL) {
 					yyerror("destination port list "
 					    "expansion not supported in "
 					    "rdr-anchor");
 					YYERROR;
 				} else if ($6.dst.port->op != PF_OP_EQ) {
 					yyerror("destination port operators"
 					    " not supported in rdr-anchor");
 					YYERROR;
 				}
 				r.dst.port[0] = $6.dst.port->port[0];
 				r.dst.port[1] = $6.dst.port->port[1];
 				r.dst.port_op = $6.dst.port->op;
 			}
 
 			expand_rule(&r, $3, NULL, $5, $6.src_os,
 			    $6.src.host, $6.src.port, $6.dst.host, $6.dst.port,
 			    0, 0, 0, $2);
 			free($2);
 		}
 		| BINATANCHOR string interface af proto fromto rtable {
 			struct pf_rule	r;
 
 			if (check_rulestate(PFCTL_STATE_NAT)) {
 				free($2);
 				YYERROR;
 			}
 
 			memset(&r, 0, sizeof(r));
 			r.action = PF_BINAT;
 			r.af = $4;
 			r.rtableid = $7;
 			if ($5 != NULL) {
 				if ($5->next != NULL) {
 					yyerror("proto list expansion"
 					    " not supported in binat-anchor");
 					YYERROR;
 				}
 				r.proto = $5->proto;
 				free($5);
 			}
 
 			if ($6.src.host != NULL || $6.src.port != NULL ||
 			    $6.dst.host != NULL || $6.dst.port != NULL) {
 				yyerror("fromto parameter not supported"
 				    " in binat-anchor");
 				YYERROR;
 			}
 
 			decide_address_family($6.src.host, &r.af);
 			decide_address_family($6.dst.host, &r.af);
 
 			pfctl_add_rule(pf, &r, $2);
 			free($2);
 		}
 		;
 
 loadrule	: LOAD ANCHOR string FROM string	{
 			struct loadanchors	*loadanchor;
 
 			if (strlen(pf->anchor->name) + 1 +
 			    strlen($3) >= MAXPATHLEN) {
 				yyerror("anchorname %s too long, max %u\n",
 				    $3, MAXPATHLEN - 1);
 				free($3);
 				YYERROR;
 			}
 			loadanchor = calloc(1, sizeof(struct loadanchors));
 			if (loadanchor == NULL)
 				err(1, "loadrule: calloc");
 			if ((loadanchor->anchorname = malloc(MAXPATHLEN)) ==
 			    NULL)
 				err(1, "loadrule: malloc");
 			if (pf->anchor->name[0])
 				snprintf(loadanchor->anchorname, MAXPATHLEN,
 				    "%s/%s", pf->anchor->name, $3);
 			else
 				strlcpy(loadanchor->anchorname, $3, MAXPATHLEN);
 			if ((loadanchor->filename = strdup($5)) == NULL)
 				err(1, "loadrule: strdup");
 
 			TAILQ_INSERT_TAIL(&loadanchorshead, loadanchor,
 			    entries);
 
 			free($3);
 			free($5);
 		};
 
 scrubaction	: no SCRUB {
 			$$.b2 = $$.w = 0;
 			if ($1)
 				$$.b1 = PF_NOSCRUB;
 			else
 				$$.b1 = PF_SCRUB;
 		}
 		;
 
 scrubrule	: scrubaction dir logquick interface af proto fromto scrub_opts
 		{
 			struct pf_rule	r;
 
 			if (check_rulestate(PFCTL_STATE_SCRUB))
 				YYERROR;
 
 			memset(&r, 0, sizeof(r));
 
 			r.action = $1.b1;
 			r.direction = $2;
 
 			r.log = $3.log;
 			r.logif = $3.logif;
 			if ($3.quick) {
 				yyerror("scrub rules do not support 'quick'");
 				YYERROR;
 			}
 
 			r.af = $5;
 			if ($8.nodf)
 				r.rule_flag |= PFRULE_NODF;
 			if ($8.randomid)
 				r.rule_flag |= PFRULE_RANDOMID;
 			if ($8.reassemble_tcp) {
 				if (r.direction != PF_INOUT) {
 					yyerror("reassemble tcp rules can not "
 					    "specify direction");
 					YYERROR;
 				}
 				r.rule_flag |= PFRULE_REASSEMBLE_TCP;
 			}
 			if ($8.minttl)
 				r.min_ttl = $8.minttl;
 			if ($8.maxmss)
 				r.max_mss = $8.maxmss;
 			if ($8.marker & SOM_SETTOS) {
 				r.rule_flag |= PFRULE_SET_TOS;
 				r.set_tos = $8.settos;
 			}
 			if ($8.fragcache)
 				r.rule_flag |= $8.fragcache;
 			if ($8.match_tag)
 				if (strlcpy(r.match_tagname, $8.match_tag,
 				    PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) {
 					yyerror("tag too long, max %u chars",
 					    PF_TAG_NAME_SIZE - 1);
 					YYERROR;
 				}
 			r.match_tag_not = $8.match_tag_not;
 			r.rtableid = $8.rtableid;
 
 			expand_rule(&r, $4, NULL, $6, $7.src_os,
 			    $7.src.host, $7.src.port, $7.dst.host, $7.dst.port,
 			    NULL, NULL, NULL, "");
 		}
 		;
 
 scrub_opts	:	{
 				bzero(&scrub_opts, sizeof scrub_opts);
 				scrub_opts.rtableid = -1;
 			}
 		    scrub_opts_l
 			{ $$ = scrub_opts; }
 		| /* empty */ {
 			bzero(&scrub_opts, sizeof scrub_opts);
 			scrub_opts.rtableid = -1;
 			$$ = scrub_opts;
 		}
 		;
 
 scrub_opts_l	: scrub_opts_l scrub_opt
 		| scrub_opt
 		;
 
 scrub_opt	: NODF	{
 			if (scrub_opts.nodf) {
 				yyerror("no-df cannot be respecified");
 				YYERROR;
 			}
 			scrub_opts.nodf = 1;
 		}
 		| MINTTL NUMBER {
 			if (scrub_opts.marker & SOM_MINTTL) {
 				yyerror("min-ttl cannot be respecified");
 				YYERROR;
 			}
 			if ($2 < 0 || $2 > 255) {
 				yyerror("illegal min-ttl value %d", $2);
 				YYERROR;
 			}
 			scrub_opts.marker |= SOM_MINTTL;
 			scrub_opts.minttl = $2;
 		}
 		| MAXMSS NUMBER {
 			if (scrub_opts.marker & SOM_MAXMSS) {
 				yyerror("max-mss cannot be respecified");
 				YYERROR;
 			}
 			if ($2 < 0 || $2 > 65535) {
 				yyerror("illegal max-mss value %d", $2);
 				YYERROR;
 			}
 			scrub_opts.marker |= SOM_MAXMSS;
 			scrub_opts.maxmss = $2;
 		}
 		| SETTOS tos {
 			if (scrub_opts.marker & SOM_SETTOS) {
 				yyerror("set-tos cannot be respecified");
 				YYERROR;
 			}
 			scrub_opts.marker |= SOM_SETTOS;
 			scrub_opts.settos = $2;
 		}
 		| fragcache {
 			if (scrub_opts.marker & SOM_FRAGCACHE) {
 				yyerror("fragcache cannot be respecified");
 				YYERROR;
 			}
 			scrub_opts.marker |= SOM_FRAGCACHE;
 			scrub_opts.fragcache = $1;
 		}
 		| REASSEMBLE STRING {
 			if (strcasecmp($2, "tcp") != 0) {
 				yyerror("scrub reassemble supports only tcp, "
 				    "not '%s'", $2);
 				free($2);
 				YYERROR;
 			}
 			free($2);
 			if (scrub_opts.reassemble_tcp) {
 				yyerror("reassemble tcp cannot be respecified");
 				YYERROR;
 			}
 			scrub_opts.reassemble_tcp = 1;
 		}
 		| RANDOMID {
 			if (scrub_opts.randomid) {
 				yyerror("random-id cannot be respecified");
 				YYERROR;
 			}
 			scrub_opts.randomid = 1;
 		}
 		| RTABLE NUMBER				{
 			if ($2 < 0 || $2 > rt_tableid_max()) {
 				yyerror("invalid rtable id");
 				YYERROR;
 			}
 			scrub_opts.rtableid = $2;
 		}
 		| not TAGGED string			{
 			scrub_opts.match_tag = $3;
 			scrub_opts.match_tag_not = $1;
 		}
 		;
 
 fragcache	: FRAGMENT REASSEMBLE	{ $$ = 0; /* default */ }
 		| FRAGMENT FRAGCROP	{ $$ = 0; }
 		| FRAGMENT FRAGDROP	{ $$ = 0; }
 		;
 
 antispoof	: ANTISPOOF logquick antispoof_ifspc af antispoof_opts {
 			struct pf_rule		 r;
 			struct node_host	*h = NULL, *hh;
 			struct node_if		*i, *j;
 
 			if (check_rulestate(PFCTL_STATE_FILTER))
 				YYERROR;
 
 			for (i = $3; i; i = i->next) {
 				bzero(&r, sizeof(r));
 
 				r.action = PF_DROP;
 				r.direction = PF_IN;
 				r.log = $2.log;
 				r.logif = $2.logif;
 				r.quick = $2.quick;
 				r.af = $4;
 				if (rule_label(&r, $5.label))
 					YYERROR;
 				r.rtableid = $5.rtableid;
 				j = calloc(1, sizeof(struct node_if));
 				if (j == NULL)
 					err(1, "antispoof: calloc");
 				if (strlcpy(j->ifname, i->ifname,
 				    sizeof(j->ifname)) >= sizeof(j->ifname)) {
 					free(j);
 					yyerror("interface name too long");
 					YYERROR;
 				}
 				j->not = 1;
 				if (i->dynamic) {
 					h = calloc(1, sizeof(*h));
 					if (h == NULL)
 						err(1, "address: calloc");
 					h->addr.type = PF_ADDR_DYNIFTL;
 					set_ipmask(h, 128);
 					if (strlcpy(h->addr.v.ifname, i->ifname,
 					    sizeof(h->addr.v.ifname)) >=
 					    sizeof(h->addr.v.ifname)) {
 						free(h);
 						yyerror(
 						    "interface name too long");
 						YYERROR;
 					}
 					hh = malloc(sizeof(*hh));
 					if (hh == NULL)
 						 err(1, "address: malloc");
 					bcopy(h, hh, sizeof(*hh));
 					h->addr.iflags = PFI_AFLAG_NETWORK;
 				} else {
 					h = ifa_lookup(j->ifname,
 					    PFI_AFLAG_NETWORK);
 					hh = NULL;
 				}
 
 				if (h != NULL)
 					expand_rule(&r, j, NULL, NULL, NULL, h,
 					    NULL, NULL, NULL, NULL, NULL,
 					    NULL, "");
 
 				if ((i->ifa_flags & IFF_LOOPBACK) == 0) {
 					bzero(&r, sizeof(r));
 
 					r.action = PF_DROP;
 					r.direction = PF_IN;
 					r.log = $2.log;
 					r.logif = $2.logif;
 					r.quick = $2.quick;
 					r.af = $4;
 					if (rule_label(&r, $5.label))
 						YYERROR;
 					r.rtableid = $5.rtableid;
 					if (hh != NULL)
 						h = hh;
 					else
 						h = ifa_lookup(i->ifname, 0);
 					if (h != NULL)
 						expand_rule(&r, NULL, NULL,
 						    NULL, NULL, h, NULL, NULL,
 						    NULL, NULL, NULL, NULL, "");
 				} else
 					free(hh);
 			}
 			free($5.label);
 		}
 		;
 
 antispoof_ifspc	: FOR antispoof_if			{ $$ = $2; }
 		| FOR '{' optnl antispoof_iflst '}'	{ $$ = $4; }
 		;
 
 antispoof_iflst	: antispoof_if optnl			{ $$ = $1; }
 		| antispoof_iflst comma antispoof_if optnl {
 			$1->tail->next = $3;
 			$1->tail = $3;
 			$$ = $1;
 		}
 		;
 
 antispoof_if	: if_item				{ $$ = $1; }
 		| '(' if_item ')'			{
 			$2->dynamic = 1;
 			$$ = $2;
 		}
 		;
 
 antispoof_opts	:	{
 				bzero(&antispoof_opts, sizeof antispoof_opts);
 				antispoof_opts.rtableid = -1;
 			}
 		    antispoof_opts_l
 			{ $$ = antispoof_opts; }
 		| /* empty */	{
 			bzero(&antispoof_opts, sizeof antispoof_opts);
 			antispoof_opts.rtableid = -1;
 			$$ = antispoof_opts;
 		}
 		;
 
 antispoof_opts_l	: antispoof_opts_l antispoof_opt
 			| antispoof_opt
 			;
 
 antispoof_opt	: label	{
 			if (antispoof_opts.label) {
 				yyerror("label cannot be redefined");
 				YYERROR;
 			}
 			antispoof_opts.label = $1;
 		}
 		| RTABLE NUMBER				{
 			if ($2 < 0 || $2 > rt_tableid_max()) {
 				yyerror("invalid rtable id");
 				YYERROR;
 			}
 			antispoof_opts.rtableid = $2;
 		}
 		;
 
 not		: '!'		{ $$ = 1; }
 		| /* empty */	{ $$ = 0; }
 		;
 
 tabledef	: TABLE '<' STRING '>' table_opts {
 			struct node_host	 *h, *nh;
 			struct node_tinit	 *ti, *nti;
 
 			if (strlen($3) >= PF_TABLE_NAME_SIZE) {
 				yyerror("table name too long, max %d chars",
 				    PF_TABLE_NAME_SIZE - 1);
 				free($3);
 				YYERROR;
 			}
 			if (pf->loadopt & PFCTL_FLAG_TABLE)
 				if (process_tabledef($3, &$5)) {
 					free($3);
 					YYERROR;
 				}
 			free($3);
 			for (ti = SIMPLEQ_FIRST(&$5.init_nodes);
 			    ti != SIMPLEQ_END(&$5.init_nodes); ti = nti) {
 				if (ti->file)
 					free(ti->file);
 				for (h = ti->host; h != NULL; h = nh) {
 					nh = h->next;
 					free(h);
 				}
 				nti = SIMPLEQ_NEXT(ti, entries);
 				free(ti);
 			}
 		}
 		;
 
 table_opts	:	{
 			bzero(&table_opts, sizeof table_opts);
 			SIMPLEQ_INIT(&table_opts.init_nodes);
 		}
 		    table_opts_l
 			{ $$ = table_opts; }
 		| /* empty */
 			{
 			bzero(&table_opts, sizeof table_opts);
 			SIMPLEQ_INIT(&table_opts.init_nodes);
 			$$ = table_opts;
 		}
 		;
 
 table_opts_l	: table_opts_l table_opt
 		| table_opt
 		;
 
 table_opt	: STRING		{
 			if (!strcmp($1, "const"))
 				table_opts.flags |= PFR_TFLAG_CONST;
 			else if (!strcmp($1, "persist"))
 				table_opts.flags |= PFR_TFLAG_PERSIST;
 			else if (!strcmp($1, "counters"))
 				table_opts.flags |= PFR_TFLAG_COUNTERS;
 			else {
 				yyerror("invalid table option '%s'", $1);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		| '{' optnl '}'		{ table_opts.init_addr = 1; }
 		| '{' optnl host_list '}'	{
 			struct node_host	*n;
 			struct node_tinit	*ti;
 
 			for (n = $3; n != NULL; n = n->next) {
 				switch (n->addr.type) {
 				case PF_ADDR_ADDRMASK:
 					continue; /* ok */
 				case PF_ADDR_RANGE:
 					yyerror("address ranges are not "
 					    "permitted inside tables");
 					break;
 				case PF_ADDR_DYNIFTL:
 					yyerror("dynamic addresses are not "
 					    "permitted inside tables");
 					break;
 				case PF_ADDR_TABLE:
 					yyerror("tables cannot contain tables");
 					break;
 				case PF_ADDR_NOROUTE:
 					yyerror("\"no-route\" is not permitted "
 					    "inside tables");
 					break;
 				case PF_ADDR_URPFFAILED:
 					yyerror("\"urpf-failed\" is not "
 					    "permitted inside tables");
 					break;
 				default:
 					yyerror("unknown address type %d",
 					    n->addr.type);
 				}
 				YYERROR;
 			}
 			if (!(ti = calloc(1, sizeof(*ti))))
 				err(1, "table_opt: calloc");
 			ti->host = $3;
 			SIMPLEQ_INSERT_TAIL(&table_opts.init_nodes, ti,
 			    entries);
 			table_opts.init_addr = 1;
 		}
 		| FILENAME STRING	{
 			struct node_tinit	*ti;
 
 			if (!(ti = calloc(1, sizeof(*ti))))
 				err(1, "table_opt: calloc");
 			ti->file = $2;
 			SIMPLEQ_INSERT_TAIL(&table_opts.init_nodes, ti,
 			    entries);
 			table_opts.init_addr = 1;
 		}
 		;
 
 altqif		: ALTQ interface queue_opts QUEUE qassign {
 			struct pf_altq	a;
 
 			if (check_rulestate(PFCTL_STATE_QUEUE))
 				YYERROR;
 
 			memset(&a, 0, sizeof(a));
 			if ($3.scheduler.qtype == ALTQT_NONE) {
 				yyerror("no scheduler specified!");
 				YYERROR;
 			}
 			a.scheduler = $3.scheduler.qtype;
 			a.qlimit = $3.qlimit;
 			a.tbrsize = $3.tbrsize;
 			if ($5 == NULL && $3.scheduler.qtype != ALTQT_CODEL) {
 				yyerror("no child queues specified");
 				YYERROR;
 			}
 			if (expand_altq(&a, $2, $5, $3.queue_bwspec,
 			    &$3.scheduler))
 				YYERROR;
 		}
 		;
 
 queuespec	: QUEUE STRING interface queue_opts qassign {
 			struct pf_altq	a;
 
 			if (check_rulestate(PFCTL_STATE_QUEUE)) {
 				free($2);
 				YYERROR;
 			}
 
 			memset(&a, 0, sizeof(a));
 
 			if (strlcpy(a.qname, $2, sizeof(a.qname)) >=
 			    sizeof(a.qname)) {
 				yyerror("queue name too long (max "
 				    "%d chars)", PF_QNAME_SIZE-1);
 				free($2);
 				YYERROR;
 			}
 			free($2);
 			if ($4.tbrsize) {
 				yyerror("cannot specify tbrsize for queue");
 				YYERROR;
 			}
 			if ($4.priority > 255) {
 				yyerror("priority out of range: max 255");
 				YYERROR;
 			}
 			a.priority = $4.priority;
 			a.qlimit = $4.qlimit;
 			a.scheduler = $4.scheduler.qtype;
 			if (expand_queue(&a, $3, $5, $4.queue_bwspec,
 			    &$4.scheduler)) {
 				yyerror("errors in queue definition");
 				YYERROR;
 			}
 		}
 		;
 
 queue_opts	:	{
 			bzero(&queue_opts, sizeof queue_opts);
 			queue_opts.priority = DEFAULT_PRIORITY;
 			queue_opts.qlimit = DEFAULT_QLIMIT;
 			queue_opts.scheduler.qtype = ALTQT_NONE;
 			queue_opts.queue_bwspec.bw_percent = 100;
 		}
 		    queue_opts_l
 			{ $$ = queue_opts; }
 		| /* empty */ {
 			bzero(&queue_opts, sizeof queue_opts);
 			queue_opts.priority = DEFAULT_PRIORITY;
 			queue_opts.qlimit = DEFAULT_QLIMIT;
 			queue_opts.scheduler.qtype = ALTQT_NONE;
 			queue_opts.queue_bwspec.bw_percent = 100;
 			$$ = queue_opts;
 		}
 		;
 
 queue_opts_l	: queue_opts_l queue_opt
 		| queue_opt
 		;
 
 queue_opt	: BANDWIDTH bandwidth	{
 			if (queue_opts.marker & QOM_BWSPEC) {
 				yyerror("bandwidth cannot be respecified");
 				YYERROR;
 			}
 			queue_opts.marker |= QOM_BWSPEC;
 			queue_opts.queue_bwspec = $2;
 		}
 		| PRIORITY NUMBER	{
 			if (queue_opts.marker & QOM_PRIORITY) {
 				yyerror("priority cannot be respecified");
 				YYERROR;
 			}
 			if ($2 < 0 || $2 > 255) {
 				yyerror("priority out of range: max 255");
 				YYERROR;
 			}
 			queue_opts.marker |= QOM_PRIORITY;
 			queue_opts.priority = $2;
 		}
 		| QLIMIT NUMBER	{
 			if (queue_opts.marker & QOM_QLIMIT) {
 				yyerror("qlimit cannot be respecified");
 				YYERROR;
 			}
 			if ($2 < 0 || $2 > 65535) {
 				yyerror("qlimit out of range: max 65535");
 				YYERROR;
 			}
 			queue_opts.marker |= QOM_QLIMIT;
 			queue_opts.qlimit = $2;
 		}
 		| scheduler	{
 			if (queue_opts.marker & QOM_SCHEDULER) {
 				yyerror("scheduler cannot be respecified");
 				YYERROR;
 			}
 			queue_opts.marker |= QOM_SCHEDULER;
 			queue_opts.scheduler = $1;
 		}
 		| TBRSIZE NUMBER	{
 			if (queue_opts.marker & QOM_TBRSIZE) {
 				yyerror("tbrsize cannot be respecified");
 				YYERROR;
 			}
 			if ($2 < 0 || $2 > 65535) {
 				yyerror("tbrsize too big: max 65535");
 				YYERROR;
 			}
 			queue_opts.marker |= QOM_TBRSIZE;
 			queue_opts.tbrsize = $2;
 		}
 		;
 
 bandwidth	: STRING {
 			double	 bps;
 			char	*cp;
 
 			$$.bw_percent = 0;
 
 			bps = strtod($1, &cp);
 			if (cp != NULL) {
 				if (strlen(cp) > 1) {
 					char *cu = cp + 1;
 					if (!strcmp(cu, "Bit") ||
 					    !strcmp(cu, "B") ||
 					    !strcmp(cu, "bit") ||
 					    !strcmp(cu, "b")) {
 						*cu = 0;
 					}
 				}
 				if (!strcmp(cp, "b"))
 					; /* nothing */
 				else if (!strcmp(cp, "K"))
 					bps *= 1000;
 				else if (!strcmp(cp, "M"))
 					bps *= 1000 * 1000;
 				else if (!strcmp(cp, "G"))
 					bps *= 1000 * 1000 * 1000;
 				else if (!strcmp(cp, "%")) {
 					if (bps < 0 || bps > 100) {
 						yyerror("bandwidth spec "
 						    "out of range");
 						free($1);
 						YYERROR;
 					}
 					$$.bw_percent = bps;
 					bps = 0;
 				} else {
 					yyerror("unknown unit %s", cp);
 					free($1);
 					YYERROR;
 				}
 			}
 			free($1);
 			$$.bw_absolute = (u_int32_t)bps;
 		}
 		| NUMBER {
 			if ($1 < 0 || $1 > UINT_MAX) {
 				yyerror("bandwidth number too big");
 				YYERROR;
 			}
 			$$.bw_percent = 0;
 			$$.bw_absolute = $1;
 		}
 		;
 
 scheduler	: CBQ				{
 			$$.qtype = ALTQT_CBQ;
 			$$.data.cbq_opts.flags = 0;
 		}
 		| CBQ '(' cbqflags_list ')'	{
 			$$.qtype = ALTQT_CBQ;
 			$$.data.cbq_opts.flags = $3;
 		}
 		| PRIQ				{
 			$$.qtype = ALTQT_PRIQ;
 			$$.data.priq_opts.flags = 0;
 		}
 		| PRIQ '(' priqflags_list ')'	{
 			$$.qtype = ALTQT_PRIQ;
 			$$.data.priq_opts.flags = $3;
 		}
 		| HFSC				{
 			$$.qtype = ALTQT_HFSC;
 			bzero(&$$.data.hfsc_opts,
 			    sizeof(struct node_hfsc_opts));
 		}
 		| HFSC '(' hfsc_opts ')'	{
 			$$.qtype = ALTQT_HFSC;
 			$$.data.hfsc_opts = $3;
 		}
 		| FAIRQ				{
 			$$.qtype = ALTQT_FAIRQ;
 			bzero(&$$.data.fairq_opts,
 				sizeof(struct node_fairq_opts));
 		}
 		| FAIRQ '(' fairq_opts ')'      {
 			$$.qtype = ALTQT_FAIRQ;
 			$$.data.fairq_opts = $3;
 		}
 		| CODEL				{
 			$$.qtype = ALTQT_CODEL;
 			bzero(&$$.data.codel_opts,
 				sizeof(struct codel_opts));
 		}
 		| CODEL '(' codel_opts ')'	{
 			$$.qtype = ALTQT_CODEL;
 			$$.data.codel_opts = $3;
 		}
 		;
 
 cbqflags_list	: cbqflags_item				{ $$ |= $1; }
 		| cbqflags_list comma cbqflags_item	{ $$ |= $3; }
 		;
 
 cbqflags_item	: STRING	{
 			if (!strcmp($1, "default"))
 				$$ = CBQCLF_DEFCLASS;
 			else if (!strcmp($1, "borrow"))
 				$$ = CBQCLF_BORROW;
 			else if (!strcmp($1, "red"))
 				$$ = CBQCLF_RED;
 			else if (!strcmp($1, "ecn"))
 				$$ = CBQCLF_RED|CBQCLF_ECN;
 			else if (!strcmp($1, "rio"))
 				$$ = CBQCLF_RIO;
 			else if (!strcmp($1, "codel"))
 				$$ = CBQCLF_CODEL;
 			else {
 				yyerror("unknown cbq flag \"%s\"", $1);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		;
 
 priqflags_list	: priqflags_item			{ $$ |= $1; }
 		| priqflags_list comma priqflags_item	{ $$ |= $3; }
 		;
 
 priqflags_item	: STRING	{
 			if (!strcmp($1, "default"))
 				$$ = PRCF_DEFAULTCLASS;
 			else if (!strcmp($1, "red"))
 				$$ = PRCF_RED;
 			else if (!strcmp($1, "ecn"))
 				$$ = PRCF_RED|PRCF_ECN;
 			else if (!strcmp($1, "rio"))
 				$$ = PRCF_RIO;
 			else if (!strcmp($1, "codel"))
 				$$ = PRCF_CODEL;
 			else {
 				yyerror("unknown priq flag \"%s\"", $1);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		;
 
 hfsc_opts	:	{
 				bzero(&hfsc_opts,
 				    sizeof(struct node_hfsc_opts));
 			}
 		    hfscopts_list				{
 			$$ = hfsc_opts;
 		}
 		;
 
 hfscopts_list	: hfscopts_item
 		| hfscopts_list comma hfscopts_item
 		;
 
 hfscopts_item	: LINKSHARE bandwidth				{
 			if (hfsc_opts.linkshare.used) {
 				yyerror("linkshare already specified");
 				YYERROR;
 			}
 			hfsc_opts.linkshare.m2 = $2;
 			hfsc_opts.linkshare.used = 1;
 		}
 		| LINKSHARE '(' bandwidth comma NUMBER comma bandwidth ')'
 		    {
 			if ($5 < 0 || $5 > INT_MAX) {
 				yyerror("timing in curve out of range");
 				YYERROR;
 			}
 			if (hfsc_opts.linkshare.used) {
 				yyerror("linkshare already specified");
 				YYERROR;
 			}
 			hfsc_opts.linkshare.m1 = $3;
 			hfsc_opts.linkshare.d = $5;
 			hfsc_opts.linkshare.m2 = $7;
 			hfsc_opts.linkshare.used = 1;
 		}
 		| REALTIME bandwidth				{
 			if (hfsc_opts.realtime.used) {
 				yyerror("realtime already specified");
 				YYERROR;
 			}
 			hfsc_opts.realtime.m2 = $2;
 			hfsc_opts.realtime.used = 1;
 		}
 		| REALTIME '(' bandwidth comma NUMBER comma bandwidth ')'
 		    {
 			if ($5 < 0 || $5 > INT_MAX) {
 				yyerror("timing in curve out of range");
 				YYERROR;
 			}
 			if (hfsc_opts.realtime.used) {
 				yyerror("realtime already specified");
 				YYERROR;
 			}
 			hfsc_opts.realtime.m1 = $3;
 			hfsc_opts.realtime.d = $5;
 			hfsc_opts.realtime.m2 = $7;
 			hfsc_opts.realtime.used = 1;
 		}
 		| UPPERLIMIT bandwidth				{
 			if (hfsc_opts.upperlimit.used) {
 				yyerror("upperlimit already specified");
 				YYERROR;
 			}
 			hfsc_opts.upperlimit.m2 = $2;
 			hfsc_opts.upperlimit.used = 1;
 		}
 		| UPPERLIMIT '(' bandwidth comma NUMBER comma bandwidth ')'
 		    {
 			if ($5 < 0 || $5 > INT_MAX) {
 				yyerror("timing in curve out of range");
 				YYERROR;
 			}
 			if (hfsc_opts.upperlimit.used) {
 				yyerror("upperlimit already specified");
 				YYERROR;
 			}
 			hfsc_opts.upperlimit.m1 = $3;
 			hfsc_opts.upperlimit.d = $5;
 			hfsc_opts.upperlimit.m2 = $7;
 			hfsc_opts.upperlimit.used = 1;
 		}
 		| STRING	{
 			if (!strcmp($1, "default"))
 				hfsc_opts.flags |= HFCF_DEFAULTCLASS;
 			else if (!strcmp($1, "red"))
 				hfsc_opts.flags |= HFCF_RED;
 			else if (!strcmp($1, "ecn"))
 				hfsc_opts.flags |= HFCF_RED|HFCF_ECN;
 			else if (!strcmp($1, "rio"))
 				hfsc_opts.flags |= HFCF_RIO;
 			else if (!strcmp($1, "codel"))
 				hfsc_opts.flags |= HFCF_CODEL;
 			else {
 				yyerror("unknown hfsc flag \"%s\"", $1);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		;
 
 fairq_opts	:	{
 				bzero(&fairq_opts,
 				    sizeof(struct node_fairq_opts));
 			}
 		    fairqopts_list				{
 			$$ = fairq_opts;
 		}
 		;
 
 fairqopts_list	: fairqopts_item
 		| fairqopts_list comma fairqopts_item
 		;
 
 fairqopts_item	: LINKSHARE bandwidth				{
 			if (fairq_opts.linkshare.used) {
 				yyerror("linkshare already specified");
 				YYERROR;
 			}
 			fairq_opts.linkshare.m2 = $2;
 			fairq_opts.linkshare.used = 1;
 		}
 		| LINKSHARE '(' bandwidth number bandwidth ')'	{
 			if (fairq_opts.linkshare.used) {
 				yyerror("linkshare already specified");
 				YYERROR;
 			}
 			fairq_opts.linkshare.m1 = $3;
 			fairq_opts.linkshare.d = $4;
 			fairq_opts.linkshare.m2 = $5;
 			fairq_opts.linkshare.used = 1;
 		}
 		| HOGS bandwidth {
 			fairq_opts.hogs_bw = $2;
 		}
 		| BUCKETS number {
 			fairq_opts.nbuckets = $2;
 		}
 		| STRING	{
 			if (!strcmp($1, "default"))
 				fairq_opts.flags |= FARF_DEFAULTCLASS;
 			else if (!strcmp($1, "red"))
 				fairq_opts.flags |= FARF_RED;
 			else if (!strcmp($1, "ecn"))
 				fairq_opts.flags |= FARF_RED|FARF_ECN;
 			else if (!strcmp($1, "rio"))
 				fairq_opts.flags |= FARF_RIO;
 			else if (!strcmp($1, "codel"))
 				fairq_opts.flags |= FARF_CODEL;
 			else {
 				yyerror("unknown fairq flag \"%s\"", $1);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		;
 
 codel_opts	:	{
 				bzero(&codel_opts,
 				    sizeof(struct codel_opts));
 			}
 		    codelopts_list				{
 			$$ = codel_opts;
 		}
 		;
 
 codelopts_list	: codelopts_item
 		| codelopts_list comma codelopts_item
 		;
 
 codelopts_item	: INTERVAL number				{
 			if (codel_opts.interval) {
 				yyerror("interval already specified");
 				YYERROR;
 			}
 			codel_opts.interval = $2;
 		}
 		| TARGET number					{
 			if (codel_opts.target) {
 				yyerror("target already specified");
 				YYERROR;
 			}
 			codel_opts.target = $2;
 		}
 		| STRING					{
 			if (!strcmp($1, "ecn"))
 				codel_opts.ecn = 1;
 			else {
 				yyerror("unknown codel option \"%s\"", $1);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		;
 
 qassign		: /* empty */		{ $$ = NULL; }
 		| qassign_item		{ $$ = $1; }
 		| '{' optnl qassign_list '}'	{ $$ = $3; }
 		;
 
 qassign_list	: qassign_item optnl		{ $$ = $1; }
 		| qassign_list comma qassign_item optnl	{
 			$1->tail->next = $3;
 			$1->tail = $3;
 			$$ = $1;
 		}
 		;
 
 qassign_item	: STRING			{
 			$$ = calloc(1, sizeof(struct node_queue));
 			if ($$ == NULL)
 				err(1, "qassign_item: calloc");
 			if (strlcpy($$->queue, $1, sizeof($$->queue)) >=
 			    sizeof($$->queue)) {
 				yyerror("queue name '%s' too long (max "
 				    "%d chars)", $1, sizeof($$->queue)-1);
 				free($1);
 				free($$);
 				YYERROR;
 			}
 			free($1);
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		;
 
 pfrule		: action dir logquick interface route af proto fromto
 		    filter_opts
 		{
 			struct pf_rule		 r;
 			struct node_state_opt	*o;
 			struct node_proto	*proto;
 			int			 srctrack = 0;
 			int			 statelock = 0;
 			int			 adaptive = 0;
 			int			 defaults = 0;
 
 			if (check_rulestate(PFCTL_STATE_FILTER))
 				YYERROR;
 
 			memset(&r, 0, sizeof(r));
 
 			r.action = $1.b1;
 			switch ($1.b2) {
 			case PFRULE_RETURNRST:
 				r.rule_flag |= PFRULE_RETURNRST;
 				r.return_ttl = $1.w;
 				break;
 			case PFRULE_RETURNICMP:
 				r.rule_flag |= PFRULE_RETURNICMP;
 				r.return_icmp = $1.w;
 				r.return_icmp6 = $1.w2;
 				break;
 			case PFRULE_RETURN:
 				r.rule_flag |= PFRULE_RETURN;
 				r.return_icmp = $1.w;
 				r.return_icmp6 = $1.w2;
 				break;
 			}
 			r.direction = $2;
 			r.log = $3.log;
 			r.logif = $3.logif;
 			r.quick = $3.quick;
 			r.prob = $9.prob;
 			r.rtableid = $9.rtableid;
 
 			if ($9.marker & FOM_PRIO) {
 				if ($9.prio == 0)
 					r.prio = PF_PRIO_ZERO;
 				else
 					r.prio = $9.prio;
 			}
 			if ($9.marker & FOM_SETPRIO) {
 				r.set_prio[0] = $9.set_prio[0];
 				r.set_prio[1] = $9.set_prio[1];
 				r.scrub_flags |= PFSTATE_SETPRIO;
 			}
 
 			r.af = $6;
 			if ($9.tag)
 				if (strlcpy(r.tagname, $9.tag,
 				    PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) {
 					yyerror("tag too long, max %u chars",
 					    PF_TAG_NAME_SIZE - 1);
 					YYERROR;
 				}
 			if ($9.match_tag)
 				if (strlcpy(r.match_tagname, $9.match_tag,
 				    PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) {
 					yyerror("tag too long, max %u chars",
 					    PF_TAG_NAME_SIZE - 1);
 					YYERROR;
 				}
 			r.match_tag_not = $9.match_tag_not;
 			if (rule_label(&r, $9.label))
 				YYERROR;
 			free($9.label);
 			r.flags = $9.flags.b1;
 			r.flagset = $9.flags.b2;
 			if (($9.flags.b1 & $9.flags.b2) != $9.flags.b1) {
 				yyerror("flags always false");
 				YYERROR;
 			}
 			if ($9.flags.b1 || $9.flags.b2 || $8.src_os) {
 				for (proto = $7; proto != NULL &&
 				    proto->proto != IPPROTO_TCP;
 				    proto = proto->next)
 					;	/* nothing */
 				if (proto == NULL && $7 != NULL) {
 					if ($9.flags.b1 || $9.flags.b2)
 						yyerror(
 						    "flags only apply to tcp");
 					if ($8.src_os)
 						yyerror(
 						    "OS fingerprinting only "
 						    "apply to tcp");
 					YYERROR;
 				}
 #if 0
 				if (($9.flags.b1 & parse_flags("S")) == 0 &&
 				    $8.src_os) {
 					yyerror("OS fingerprinting requires "
 					    "the SYN TCP flag (flags S/SA)");
 					YYERROR;
 				}
 #endif
 			}
 
 			r.tos = $9.tos;
 			r.keep_state = $9.keep.action;
 			o = $9.keep.options;
 
 			/* 'keep state' by default on pass rules. */
 			if (!r.keep_state && !r.action &&
 			    !($9.marker & FOM_KEEP)) {
 				r.keep_state = PF_STATE_NORMAL;
 				o = keep_state_defaults;
 				defaults = 1;
 			}
 
 			while (o) {
 				struct node_state_opt	*p = o;
 
 				switch (o->type) {
 				case PF_STATE_OPT_MAX:
 					if (r.max_states) {
 						yyerror("state option 'max' "
 						    "multiple definitions");
 						YYERROR;
 					}
 					r.max_states = o->data.max_states;
 					break;
 				case PF_STATE_OPT_NOSYNC:
 					if (r.rule_flag & PFRULE_NOSYNC) {
 						yyerror("state option 'sync' "
 						    "multiple definitions");
 						YYERROR;
 					}
 					r.rule_flag |= PFRULE_NOSYNC;
 					break;
 				case PF_STATE_OPT_SRCTRACK:
 					if (srctrack) {
 						yyerror("state option "
 						    "'source-track' "
 						    "multiple definitions");
 						YYERROR;
 					}
 					srctrack =  o->data.src_track;
 					r.rule_flag |= PFRULE_SRCTRACK;
 					break;
 				case PF_STATE_OPT_MAX_SRC_STATES:
 					if (r.max_src_states) {
 						yyerror("state option "
 						    "'max-src-states' "
 						    "multiple definitions");
 						YYERROR;
 					}
 					if (o->data.max_src_states == 0) {
 						yyerror("'max-src-states' must "
 						    "be > 0");
 						YYERROR;
 					}
 					r.max_src_states =
 					    o->data.max_src_states;
 					r.rule_flag |= PFRULE_SRCTRACK;
 					break;
 				case PF_STATE_OPT_OVERLOAD:
 					if (r.overload_tblname[0]) {
 						yyerror("multiple 'overload' "
 						    "table definitions");
 						YYERROR;
 					}
 					if (strlcpy(r.overload_tblname,
 					    o->data.overload.tblname,
 					    PF_TABLE_NAME_SIZE) >=
 					    PF_TABLE_NAME_SIZE) {
 						yyerror("state option: "
 						    "strlcpy");
 						YYERROR;
 					}
 					r.flush = o->data.overload.flush;
 					break;
 				case PF_STATE_OPT_MAX_SRC_CONN:
 					if (r.max_src_conn) {
 						yyerror("state option "
 						    "'max-src-conn' "
 						    "multiple definitions");
 						YYERROR;
 					}
 					if (o->data.max_src_conn == 0) {
 						yyerror("'max-src-conn' "
 						    "must be > 0");
 						YYERROR;
 					}
 					r.max_src_conn =
 					    o->data.max_src_conn;
 					r.rule_flag |= PFRULE_SRCTRACK |
 					    PFRULE_RULESRCTRACK;
 					break;
 				case PF_STATE_OPT_MAX_SRC_CONN_RATE:
 					if (r.max_src_conn_rate.limit) {
 						yyerror("state option "
 						    "'max-src-conn-rate' "
 						    "multiple definitions");
 						YYERROR;
 					}
 					if (!o->data.max_src_conn_rate.limit ||
 					    !o->data.max_src_conn_rate.seconds) {
 						yyerror("'max-src-conn-rate' "
 						    "values must be > 0");
 						YYERROR;
 					}
 					if (o->data.max_src_conn_rate.limit >
 					    PF_THRESHOLD_MAX) {
 						yyerror("'max-src-conn-rate' "
 						    "maximum rate must be < %u",
 						    PF_THRESHOLD_MAX);
 						YYERROR;
 					}
 					r.max_src_conn_rate.limit =
 					    o->data.max_src_conn_rate.limit;
 					r.max_src_conn_rate.seconds =
 					    o->data.max_src_conn_rate.seconds;
 					r.rule_flag |= PFRULE_SRCTRACK |
 					    PFRULE_RULESRCTRACK;
 					break;
 				case PF_STATE_OPT_MAX_SRC_NODES:
 					if (r.max_src_nodes) {
 						yyerror("state option "
 						    "'max-src-nodes' "
 						    "multiple definitions");
 						YYERROR;
 					}
 					if (o->data.max_src_nodes == 0) {
 						yyerror("'max-src-nodes' must "
 						    "be > 0");
 						YYERROR;
 					}
 					r.max_src_nodes =
 					    o->data.max_src_nodes;
 					r.rule_flag |= PFRULE_SRCTRACK |
 					    PFRULE_RULESRCTRACK;
 					break;
 				case PF_STATE_OPT_STATELOCK:
 					if (statelock) {
 						yyerror("state locking option: "
 						    "multiple definitions");
 						YYERROR;
 					}
 					statelock = 1;
 					r.rule_flag |= o->data.statelock;
 					break;
 				case PF_STATE_OPT_SLOPPY:
 					if (r.rule_flag & PFRULE_STATESLOPPY) {
 						yyerror("state sloppy option: "
 						    "multiple definitions");
 						YYERROR;
 					}
 					r.rule_flag |= PFRULE_STATESLOPPY;
 					break;
 				case PF_STATE_OPT_TIMEOUT:
 					if (o->data.timeout.number ==
 					    PFTM_ADAPTIVE_START ||
 					    o->data.timeout.number ==
 					    PFTM_ADAPTIVE_END)
 						adaptive = 1;
 					if (r.timeout[o->data.timeout.number]) {
 						yyerror("state timeout %s "
 						    "multiple definitions",
 						    pf_timeouts[o->data.
 						    timeout.number].name);
 						YYERROR;
 					}
 					r.timeout[o->data.timeout.number] =
 					    o->data.timeout.seconds;
 				}
 				o = o->next;
 				if (!defaults)
 					free(p);
 			}
 
 			/* 'flags S/SA' by default on stateful rules */
 			if (!r.action && !r.flags && !r.flagset &&
 			    !$9.fragment && !($9.marker & FOM_FLAGS) &&
 			    r.keep_state) {
 				r.flags = parse_flags("S");
 				r.flagset =  parse_flags("SA");
 			}
 			if (!adaptive && r.max_states) {
 				r.timeout[PFTM_ADAPTIVE_START] =
 				    (r.max_states / 10) * 6;
 				r.timeout[PFTM_ADAPTIVE_END] =
 				    (r.max_states / 10) * 12;
 			}
 			if (r.rule_flag & PFRULE_SRCTRACK) {
 				if (srctrack == PF_SRCTRACK_GLOBAL &&
 				    r.max_src_nodes) {
 					yyerror("'max-src-nodes' is "
 					    "incompatible with "
 					    "'source-track global'");
 					YYERROR;
 				}
 				if (srctrack == PF_SRCTRACK_GLOBAL &&
 				    r.max_src_conn) {
 					yyerror("'max-src-conn' is "
 					    "incompatible with "
 					    "'source-track global'");
 					YYERROR;
 				}
 				if (srctrack == PF_SRCTRACK_GLOBAL &&
 				    r.max_src_conn_rate.seconds) {
 					yyerror("'max-src-conn-rate' is "
 					    "incompatible with "
 					    "'source-track global'");
 					YYERROR;
 				}
 				if (r.timeout[PFTM_SRC_NODE] <
 				    r.max_src_conn_rate.seconds)
 					r.timeout[PFTM_SRC_NODE] =
 					    r.max_src_conn_rate.seconds;
 				r.rule_flag |= PFRULE_SRCTRACK;
 				if (srctrack == PF_SRCTRACK_RULE)
 					r.rule_flag |= PFRULE_RULESRCTRACK;
 			}
 			if (r.keep_state && !statelock)
 				r.rule_flag |= default_statelock;
 
 			if ($9.fragment)
 				r.rule_flag |= PFRULE_FRAGMENT;
 			r.allow_opts = $9.allowopts;
 
 			decide_address_family($8.src.host, &r.af);
 			decide_address_family($8.dst.host, &r.af);
 
 			if ($5.rt) {
 				if (!r.direction) {
 					yyerror("direction must be explicit "
 					    "with rules that specify routing");
 					YYERROR;
 				}
 				r.rt = $5.rt;
 				r.rpool.opts = $5.pool_opts;
 				if ($5.key != NULL)
 					memcpy(&r.rpool.key, $5.key,
 					    sizeof(struct pf_poolhashkey));
 			}
 			if (r.rt && r.rt != PF_FASTROUTE) {
 				decide_address_family($5.host, &r.af);
 				remove_invalid_hosts(&$5.host, &r.af);
 				if ($5.host == NULL) {
 					yyerror("no routing address with "
 					    "matching address family found.");
 					YYERROR;
 				}
 				if ((r.rpool.opts & PF_POOL_TYPEMASK) ==
 				    PF_POOL_NONE && ($5.host->next != NULL ||
 				    $5.host->addr.type == PF_ADDR_TABLE ||
 				    DYNIF_MULTIADDR($5.host->addr)))
 					r.rpool.opts |= PF_POOL_ROUNDROBIN;
 				if ((r.rpool.opts & PF_POOL_TYPEMASK) !=
 				    PF_POOL_ROUNDROBIN &&
 				    disallow_table($5.host, "tables are only "
 				    "supported in round-robin routing pools"))
 					YYERROR;
 				if ((r.rpool.opts & PF_POOL_TYPEMASK) !=
 				    PF_POOL_ROUNDROBIN &&
 				    disallow_alias($5.host, "interface (%s) "
 				    "is only supported in round-robin "
 				    "routing pools"))
 					YYERROR;
 				if ($5.host->next != NULL) {
 					if ((r.rpool.opts & PF_POOL_TYPEMASK) !=
 					    PF_POOL_ROUNDROBIN) {
 						yyerror("r.rpool.opts must "
 						    "be PF_POOL_ROUNDROBIN");
 						YYERROR;
 					}
 				}
 			}
 			if ($9.queues.qname != NULL) {
 				if (strlcpy(r.qname, $9.queues.qname,
 				    sizeof(r.qname)) >= sizeof(r.qname)) {
 					yyerror("rule qname too long (max "
 					    "%d chars)", sizeof(r.qname)-1);
 					YYERROR;
 				}
 				free($9.queues.qname);
 			}
 			if ($9.queues.pqname != NULL) {
 				if (strlcpy(r.pqname, $9.queues.pqname,
 				    sizeof(r.pqname)) >= sizeof(r.pqname)) {
 					yyerror("rule pqname too long (max "
 					    "%d chars)", sizeof(r.pqname)-1);
 					YYERROR;
 				}
 				free($9.queues.pqname);
 			}
 #ifdef __FreeBSD__
 			r.divert.port = $9.divert.port;
 #else
 			if ((r.divert.port = $9.divert.port)) {
 				if (r.direction == PF_OUT) {
 					if ($9.divert.addr) {
 						yyerror("address specified "
 						    "for outgoing divert");
 						YYERROR;
 					}
 					bzero(&r.divert.addr,
 					    sizeof(r.divert.addr));
 				} else {
 					if (!$9.divert.addr) {
 						yyerror("no address specified "
 						    "for incoming divert");
 						YYERROR;
 					}
 					if ($9.divert.addr->af != r.af) {
 						yyerror("address family "
 						    "mismatch for divert");
 						YYERROR;
 					}
 					r.divert.addr =
 					    $9.divert.addr->addr.v.a.addr;
 				}
 			}
 #endif
 
 			expand_rule(&r, $4, $5.host, $7, $8.src_os,
 			    $8.src.host, $8.src.port, $8.dst.host, $8.dst.port,
 			    $9.uid, $9.gid, $9.icmpspec, "");
 		}
 		;
 
 filter_opts	:	{
 				bzero(&filter_opts, sizeof filter_opts);
 				filter_opts.rtableid = -1;
 			}
 		    filter_opts_l
 			{ $$ = filter_opts; }
 		| /* empty */	{
 			bzero(&filter_opts, sizeof filter_opts);
 			filter_opts.rtableid = -1;
 			$$ = filter_opts;
 		}
 		;
 
 filter_opts_l	: filter_opts_l filter_opt
 		| filter_opt
 		;
 
 filter_opt	: USER uids {
 			if (filter_opts.uid)
 				$2->tail->next = filter_opts.uid;
 			filter_opts.uid = $2;
 		}
 		| GROUP gids {
 			if (filter_opts.gid)
 				$2->tail->next = filter_opts.gid;
 			filter_opts.gid = $2;
 		}
 		| flags {
 			if (filter_opts.marker & FOM_FLAGS) {
 				yyerror("flags cannot be redefined");
 				YYERROR;
 			}
 			filter_opts.marker |= FOM_FLAGS;
 			filter_opts.flags.b1 |= $1.b1;
 			filter_opts.flags.b2 |= $1.b2;
 			filter_opts.flags.w |= $1.w;
 			filter_opts.flags.w2 |= $1.w2;
 		}
 		| icmpspec {
 			if (filter_opts.marker & FOM_ICMP) {
 				yyerror("icmp-type cannot be redefined");
 				YYERROR;
 			}
 			filter_opts.marker |= FOM_ICMP;
 			filter_opts.icmpspec = $1;
 		}
 		| PRIO NUMBER {
 			if (filter_opts.marker & FOM_PRIO) {
 				yyerror("prio cannot be redefined");
 				YYERROR;
 			}
 			if ($2 < 0 || $2 > PF_PRIO_MAX) {
 				yyerror("prio must be 0 - %u", PF_PRIO_MAX);
 				YYERROR;
 			}
 			filter_opts.marker |= FOM_PRIO;
 			filter_opts.prio = $2;
 		}
 		| TOS tos {
 			if (filter_opts.marker & FOM_TOS) {
 				yyerror("tos cannot be redefined");
 				YYERROR;
 			}
 			filter_opts.marker |= FOM_TOS;
 			filter_opts.tos = $2;
 		}
 		| keep {
 			if (filter_opts.marker & FOM_KEEP) {
 				yyerror("modulate or keep cannot be redefined");
 				YYERROR;
 			}
 			filter_opts.marker |= FOM_KEEP;
 			filter_opts.keep.action = $1.action;
 			filter_opts.keep.options = $1.options;
 		}
 		| FRAGMENT {
 			filter_opts.fragment = 1;
 		}
 		| ALLOWOPTS {
 			filter_opts.allowopts = 1;
 		}
 		| label	{
 			if (filter_opts.label) {
 				yyerror("label cannot be redefined");
 				YYERROR;
 			}
 			filter_opts.label = $1;
 		}
 		| qname	{
 			if (filter_opts.queues.qname) {
 				yyerror("queue cannot be redefined");
 				YYERROR;
 			}
 			filter_opts.queues = $1;
 		}
 		| TAG string				{
 			filter_opts.tag = $2;
 		}
 		| not TAGGED string			{
 			filter_opts.match_tag = $3;
 			filter_opts.match_tag_not = $1;
 		}
 		| PROBABILITY probability		{
 			double	p;
 
 			p = floor($2 * UINT_MAX + 0.5);
 			if (p < 0.0 || p > UINT_MAX) {
 				yyerror("invalid probability: %lf", p);
 				YYERROR;
 			}
 			filter_opts.prob = (u_int32_t)p;
 			if (filter_opts.prob == 0)
 				filter_opts.prob = 1;
 		}
 		| RTABLE NUMBER				{
 			if ($2 < 0 || $2 > rt_tableid_max()) {
 				yyerror("invalid rtable id");
 				YYERROR;
 			}
 			filter_opts.rtableid = $2;
 		}
 		| DIVERTTO portplain {
 #ifdef __FreeBSD__
 			filter_opts.divert.port = $2.a;
 			if (!filter_opts.divert.port) {
 				yyerror("invalid divert port: %u", ntohs($2.a));
 				YYERROR;
 			}
 #endif
 		}
 		| DIVERTTO STRING PORT portplain {
 #ifndef __FreeBSD__
 			if ((filter_opts.divert.addr = host($2)) == NULL) {
 				yyerror("could not parse divert address: %s",
 				    $2);
 				free($2);
 				YYERROR;
 			}
 #else
 			if ($2)
 #endif
 			free($2);
 			filter_opts.divert.port = $4.a;
 			if (!filter_opts.divert.port) {
 				yyerror("invalid divert port: %u", ntohs($4.a));
 				YYERROR;
 			}
 		}
 		| DIVERTREPLY {
 #ifdef __FreeBSD__
 			yyerror("divert-reply has no meaning in FreeBSD pf(4)");
 			YYERROR;
 #else
 			filter_opts.divert.port = 1;	/* some random value */
 #endif
 		}
 		| filter_sets
 		;
 
 filter_sets	: SET '(' filter_sets_l ')'	{ $$ = filter_opts; }
 		| SET filter_set		{ $$ = filter_opts; }
 		;
 
 filter_sets_l	: filter_sets_l comma filter_set
 		| filter_set
 		;
 
 filter_set	: prio {
 			if (filter_opts.marker & FOM_SETPRIO) {
 				yyerror("prio cannot be redefined");
 				YYERROR;
 			}
 			filter_opts.marker |= FOM_SETPRIO;
 			filter_opts.set_prio[0] = $1.b1;
 			filter_opts.set_prio[1] = $1.b2;
 		}
 prio		: PRIO NUMBER {
 			if ($2 < 0 || $2 > PF_PRIO_MAX) {
 				yyerror("prio must be 0 - %u", PF_PRIO_MAX);
 				YYERROR;
 			}
 			$$.b1 = $$.b2 = $2;
 		}
 		| PRIO '(' NUMBER comma NUMBER ')' {
 			if ($3 < 0 || $3 > PF_PRIO_MAX ||
 			    $5 < 0 || $5 > PF_PRIO_MAX) {
 				yyerror("prio must be 0 - %u", PF_PRIO_MAX);
 				YYERROR;
 			}
 			$$.b1 = $3;
 			$$.b2 = $5;
 		}
 		;
 
 probability	: STRING				{
 			char	*e;
 			double	 p = strtod($1, &e);
 
 			if (*e == '%') {
 				p *= 0.01;
 				e++;
 			}
 			if (*e) {
 				yyerror("invalid probability: %s", $1);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 			$$ = p;
 		}
 		| NUMBER				{
 			$$ = (double)$1;
 		}
 		;
 
 
 action		: PASS			{ $$.b1 = PF_PASS; $$.b2 = $$.w = 0; }
 		| BLOCK blockspec	{ $$ = $2; $$.b1 = PF_DROP; }
 		;
 
 blockspec	: /* empty */		{
 			$$.b2 = blockpolicy;
 			$$.w = returnicmpdefault;
 			$$.w2 = returnicmp6default;
 		}
 		| DROP			{
 			$$.b2 = PFRULE_DROP;
 			$$.w = 0;
 			$$.w2 = 0;
 		}
 		| RETURNRST		{
 			$$.b2 = PFRULE_RETURNRST;
 			$$.w = 0;
 			$$.w2 = 0;
 		}
 		| RETURNRST '(' TTL NUMBER ')'	{
 			if ($4 < 0 || $4 > 255) {
 				yyerror("illegal ttl value %d", $4);
 				YYERROR;
 			}
 			$$.b2 = PFRULE_RETURNRST;
 			$$.w = $4;
 			$$.w2 = 0;
 		}
 		| RETURNICMP		{
 			$$.b2 = PFRULE_RETURNICMP;
 			$$.w = returnicmpdefault;
 			$$.w2 = returnicmp6default;
 		}
 		| RETURNICMP6		{
 			$$.b2 = PFRULE_RETURNICMP;
 			$$.w = returnicmpdefault;
 			$$.w2 = returnicmp6default;
 		}
 		| RETURNICMP '(' reticmpspec ')'	{
 			$$.b2 = PFRULE_RETURNICMP;
 			$$.w = $3;
 			$$.w2 = returnicmpdefault;
 		}
 		| RETURNICMP6 '(' reticmp6spec ')'	{
 			$$.b2 = PFRULE_RETURNICMP;
 			$$.w = returnicmpdefault;
 			$$.w2 = $3;
 		}
 		| RETURNICMP '(' reticmpspec comma reticmp6spec ')' {
 			$$.b2 = PFRULE_RETURNICMP;
 			$$.w = $3;
 			$$.w2 = $5;
 		}
 		| RETURN {
 			$$.b2 = PFRULE_RETURN;
 			$$.w = returnicmpdefault;
 			$$.w2 = returnicmp6default;
 		}
 		;
 
 reticmpspec	: STRING			{
 			if (!($$ = parseicmpspec($1, AF_INET))) {
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		| NUMBER			{
 			u_int8_t		icmptype;
 
 			if ($1 < 0 || $1 > 255) {
 				yyerror("invalid icmp code %lu", $1);
 				YYERROR;
 			}
 			icmptype = returnicmpdefault >> 8;
 			$$ = (icmptype << 8 | $1);
 		}
 		;
 
 reticmp6spec	: STRING			{
 			if (!($$ = parseicmpspec($1, AF_INET6))) {
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		| NUMBER			{
 			u_int8_t		icmptype;
 
 			if ($1 < 0 || $1 > 255) {
 				yyerror("invalid icmp code %lu", $1);
 				YYERROR;
 			}
 			icmptype = returnicmp6default >> 8;
 			$$ = (icmptype << 8 | $1);
 		}
 		;
 
 dir		: /* empty */			{ $$ = PF_INOUT; }
 		| IN				{ $$ = PF_IN; }
 		| OUT				{ $$ = PF_OUT; }
 		;
 
 quick		: /* empty */			{ $$.quick = 0; }
 		| QUICK				{ $$.quick = 1; }
 		;
 
 logquick	: /* empty */	{ $$.log = 0; $$.quick = 0; $$.logif = 0; }
 		| log		{ $$ = $1; $$.quick = 0; }
 		| QUICK		{ $$.quick = 1; $$.log = 0; $$.logif = 0; }
 		| log QUICK	{ $$ = $1; $$.quick = 1; }
 		| QUICK log	{ $$ = $2; $$.quick = 1; }
 		;
 
 log		: LOG			{ $$.log = PF_LOG; $$.logif = 0; }
 		| LOG '(' logopts ')'	{
 			$$.log = PF_LOG | $3.log;
 			$$.logif = $3.logif;
 		}
 		;
 
 logopts		: logopt			{ $$ = $1; }
 		| logopts comma logopt		{
 			$$.log = $1.log | $3.log;
 			$$.logif = $3.logif;
 			if ($$.logif == 0)
 				$$.logif = $1.logif;
 		}
 		;
 
 logopt		: ALL		{ $$.log = PF_LOG_ALL; $$.logif = 0; }
 		| USER		{ $$.log = PF_LOG_SOCKET_LOOKUP; $$.logif = 0; }
 		| GROUP		{ $$.log = PF_LOG_SOCKET_LOOKUP; $$.logif = 0; }
 		| TO string	{
 			const char	*errstr;
 			u_int		 i;
 
 			$$.log = 0;
 			if (strncmp($2, "pflog", 5)) {
 				yyerror("%s: should be a pflog interface", $2);
 				free($2);
 				YYERROR;
 			}
 			i = strtonum($2 + 5, 0, 255, &errstr);
 			if (errstr) {
 				yyerror("%s: %s", $2, errstr);
 				free($2);
 				YYERROR;
 			}
 			free($2);
 			$$.logif = i;
 		}
 		;
 
 interface	: /* empty */			{ $$ = NULL; }
 		| ON if_item_not		{ $$ = $2; }
 		| ON '{' optnl if_list '}'	{ $$ = $4; }
 		;
 
 if_list		: if_item_not optnl		{ $$ = $1; }
 		| if_list comma if_item_not optnl	{
 			$1->tail->next = $3;
 			$1->tail = $3;
 			$$ = $1;
 		}
 		;
 
 if_item_not	: not if_item			{ $$ = $2; $$->not = $1; }
 		;
 
 if_item		: STRING			{
 			struct node_host	*n;
 
 			$$ = calloc(1, sizeof(struct node_if));
 			if ($$ == NULL)
 				err(1, "if_item: calloc");
 			if (strlcpy($$->ifname, $1, sizeof($$->ifname)) >=
 			    sizeof($$->ifname)) {
 				free($1);
 				free($$);
 				yyerror("interface name too long");
 				YYERROR;
 			}
 
 			if ((n = ifa_exists($1)) != NULL)
 				$$->ifa_flags = n->ifa_flags;
 
 			free($1);
 			$$->not = 0;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		;
 
 af		: /* empty */			{ $$ = 0; }
 		| INET				{ $$ = AF_INET; }
 		| INET6				{ $$ = AF_INET6; }
 		;
 
 proto		: /* empty */				{ $$ = NULL; }
 		| PROTO proto_item			{ $$ = $2; }
 		| PROTO '{' optnl proto_list '}'	{ $$ = $4; }
 		;
 
 proto_list	: proto_item optnl		{ $$ = $1; }
 		| proto_list comma proto_item optnl	{
 			$1->tail->next = $3;
 			$1->tail = $3;
 			$$ = $1;
 		}
 		;
 
 proto_item	: protoval			{
 			u_int8_t	pr;
 
 			pr = (u_int8_t)$1;
 			if (pr == 0) {
 				yyerror("proto 0 cannot be used");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_proto));
 			if ($$ == NULL)
 				err(1, "proto_item: calloc");
 			$$->proto = pr;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		;
 
 protoval	: STRING			{
 			struct protoent	*p;
 
 			p = getprotobyname($1);
 			if (p == NULL) {
 				yyerror("unknown protocol %s", $1);
 				free($1);
 				YYERROR;
 			}
 			$$ = p->p_proto;
 			free($1);
 		}
 		| NUMBER			{
 			if ($1 < 0 || $1 > 255) {
 				yyerror("protocol outside range");
 				YYERROR;
 			}
 		}
 		;
 
 fromto		: ALL				{
 			$$.src.host = NULL;
 			$$.src.port = NULL;
 			$$.dst.host = NULL;
 			$$.dst.port = NULL;
 			$$.src_os = NULL;
 		}
 		| from os to			{
 			$$.src = $1;
 			$$.src_os = $2;
 			$$.dst = $3;
 		}
 		;
 
 os		: /* empty */			{ $$ = NULL; }
 		| OS xos			{ $$ = $2; }
 		| OS '{' optnl os_list '}'	{ $$ = $4; }
 		;
 
 xos		: STRING {
 			$$ = calloc(1, sizeof(struct node_os));
 			if ($$ == NULL)
 				err(1, "os: calloc");
 			$$->os = $1;
 			$$->tail = $$;
 		}
 		;
 
 os_list		: xos optnl 			{ $$ = $1; }
 		| os_list comma xos optnl	{
 			$1->tail->next = $3;
 			$1->tail = $3;
 			$$ = $1;
 		}
 		;
 
 from		: /* empty */			{
 			$$.host = NULL;
 			$$.port = NULL;
 		}
 		| FROM ipportspec		{
 			$$ = $2;
 		}
 		;
 
 to		: /* empty */			{
 			$$.host = NULL;
 			$$.port = NULL;
 		}
 		| TO ipportspec		{
 			if (disallow_urpf_failed($2.host, "\"urpf-failed\" is "
 			    "not permitted in a destination address"))
 				YYERROR;
 			$$ = $2;
 		}
 		;
 
 ipportspec	: ipspec			{
 			$$.host = $1;
 			$$.port = NULL;
 		}
 		| ipspec PORT portspec		{
 			$$.host = $1;
 			$$.port = $3;
 		}
 		| PORT portspec			{
 			$$.host = NULL;
 			$$.port = $2;
 		}
 		;
 
 optnl		: '\n' optnl
 		|
 		;
 
 ipspec		: ANY				{ $$ = NULL; }
 		| xhost				{ $$ = $1; }
 		| '{' optnl host_list '}'	{ $$ = $3; }
 		;
 
 toipspec	: TO ipspec			{ $$ = $2; }
 		| /* empty */			{ $$ = NULL; }
 		;
 
 host_list	: ipspec optnl			{ $$ = $1; }
 		| host_list comma ipspec optnl	{
 			if ($3 == NULL)
 				$$ = $1;
 			else if ($1 == NULL)
 				$$ = $3;
 			else {
 				$1->tail->next = $3;
 				$1->tail = $3->tail;
 				$$ = $1;
 			}
 		}
 		;
 
 xhost		: not host			{
 			struct node_host	*n;
 
 			for (n = $2; n != NULL; n = n->next)
 				n->not = $1;
 			$$ = $2;
 		}
 		| not NOROUTE			{
 			$$ = calloc(1, sizeof(struct node_host));
 			if ($$ == NULL)
 				err(1, "xhost: calloc");
 			$$->addr.type = PF_ADDR_NOROUTE;
 			$$->next = NULL;
 			$$->not = $1;
 			$$->tail = $$;
 		}
 		| not URPFFAILED		{
 			$$ = calloc(1, sizeof(struct node_host));
 			if ($$ == NULL)
 				err(1, "xhost: calloc");
 			$$->addr.type = PF_ADDR_URPFFAILED;
 			$$->next = NULL;
 			$$->not = $1;
 			$$->tail = $$;
 		}
 		;
 
 host		: STRING			{
 			if (($$ = host($1)) == NULL)	{
 				/* error. "any" is handled elsewhere */
 				free($1);
 				yyerror("could not parse host specification");
 				YYERROR;
 			}
 			free($1);
 
 		}
 		| STRING '-' STRING		{
 			struct node_host *b, *e;
 
 			if ((b = host($1)) == NULL || (e = host($3)) == NULL) {
 				free($1);
 				free($3);
 				yyerror("could not parse host specification");
 				YYERROR;
 			}
 			if (b->af != e->af ||
 			    b->addr.type != PF_ADDR_ADDRMASK ||
 			    e->addr.type != PF_ADDR_ADDRMASK ||
 			    unmask(&b->addr.v.a.mask, b->af) !=
 			    (b->af == AF_INET ? 32 : 128) ||
 			    unmask(&e->addr.v.a.mask, e->af) !=
 			    (e->af == AF_INET ? 32 : 128) ||
 			    b->next != NULL || b->not ||
 			    e->next != NULL || e->not) {
 				free(b);
 				free(e);
 				free($1);
 				free($3);
 				yyerror("invalid address range");
 				YYERROR;
 			}
 			memcpy(&b->addr.v.a.mask, &e->addr.v.a.addr,
 			    sizeof(b->addr.v.a.mask));
 			b->addr.type = PF_ADDR_RANGE;
 			$$ = b;
 			free(e);
 			free($1);
 			free($3);
 		}
 		| STRING '/' NUMBER		{
 			char	*buf;
 
 			if (asprintf(&buf, "%s/%lld", $1, (long long)$3) == -1)
 				err(1, "host: asprintf");
 			free($1);
 			if (($$ = host(buf)) == NULL)	{
 				/* error. "any" is handled elsewhere */
 				free(buf);
 				yyerror("could not parse host specification");
 				YYERROR;
 			}
 			free(buf);
 		}
 		| NUMBER '/' NUMBER		{
 			char	*buf;
 
 			/* ie. for 10/8 parsing */
 #ifdef __FreeBSD__
 			if (asprintf(&buf, "%lld/%lld", (long long)$1, (long long)$3) == -1)
 #else
 			if (asprintf(&buf, "%lld/%lld", $1, $3) == -1)
 #endif
 				err(1, "host: asprintf");
 			if (($$ = host(buf)) == NULL)	{
 				/* error. "any" is handled elsewhere */
 				free(buf);
 				yyerror("could not parse host specification");
 				YYERROR;
 			}
 			free(buf);
 		}
 		| dynaddr
 		| dynaddr '/' NUMBER		{
 			struct node_host	*n;
 
 			if ($3 < 0 || $3 > 128) {
 				yyerror("bit number too big");
 				YYERROR;
 			}
 			$$ = $1;
 			for (n = $1; n != NULL; n = n->next)
 				set_ipmask(n, $3);
 		}
 		| '<' STRING '>'	{
 			if (strlen($2) >= PF_TABLE_NAME_SIZE) {
 				yyerror("table name '%s' too long", $2);
 				free($2);
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_host));
 			if ($$ == NULL)
 				err(1, "host: calloc");
 			$$->addr.type = PF_ADDR_TABLE;
 			if (strlcpy($$->addr.v.tblname, $2,
 			    sizeof($$->addr.v.tblname)) >=
 			    sizeof($$->addr.v.tblname))
 				errx(1, "host: strlcpy");
 			free($2);
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		;
 
 number		: NUMBER
 		| STRING		{
 			u_long	ulval;
 
 			if (atoul($1, &ulval) == -1) {
 				yyerror("%s is not a number", $1);
 				free($1);
 				YYERROR;
 			} else
 				$$ = ulval;
 			free($1);
 		}
 		;
 
 dynaddr		: '(' STRING ')'		{
 			int	 flags = 0;
 			char	*p, *op;
 
 			op = $2;
 			if (!isalpha(op[0])) {
 				yyerror("invalid interface name '%s'", op);
 				free(op);
 				YYERROR;
 			}
 			while ((p = strrchr($2, ':')) != NULL) {
 				if (!strcmp(p+1, "network"))
 					flags |= PFI_AFLAG_NETWORK;
 				else if (!strcmp(p+1, "broadcast"))
 					flags |= PFI_AFLAG_BROADCAST;
 				else if (!strcmp(p+1, "peer"))
 					flags |= PFI_AFLAG_PEER;
 				else if (!strcmp(p+1, "0"))
 					flags |= PFI_AFLAG_NOALIAS;
 				else {
 					yyerror("interface %s has bad modifier",
 					    $2);
 					free(op);
 					YYERROR;
 				}
 				*p = '\0';
 			}
 			if (flags & (flags - 1) & PFI_AFLAG_MODEMASK) {
 				free(op);
 				yyerror("illegal combination of "
 				    "interface modifiers");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_host));
 			if ($$ == NULL)
 				err(1, "address: calloc");
 			$$->af = 0;
 			set_ipmask($$, 128);
 			$$->addr.type = PF_ADDR_DYNIFTL;
 			$$->addr.iflags = flags;
 			if (strlcpy($$->addr.v.ifname, $2,
 			    sizeof($$->addr.v.ifname)) >=
 			    sizeof($$->addr.v.ifname)) {
 				free(op);
 				free($$);
 				yyerror("interface name too long");
 				YYERROR;
 			}
 			free(op);
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		;
 
 portspec	: port_item			{ $$ = $1; }
 		| '{' optnl port_list '}'	{ $$ = $3; }
 		;
 
 port_list	: port_item optnl		{ $$ = $1; }
 		| port_list comma port_item optnl	{
 			$1->tail->next = $3;
 			$1->tail = $3;
 			$$ = $1;
 		}
 		;
 
 port_item	: portrange			{
 			$$ = calloc(1, sizeof(struct node_port));
 			if ($$ == NULL)
 				err(1, "port_item: calloc");
 			$$->port[0] = $1.a;
 			$$->port[1] = $1.b;
 			if ($1.t)
 				$$->op = PF_OP_RRG;
 			else
 				$$->op = PF_OP_EQ;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| unaryop portrange	{
 			if ($2.t) {
 				yyerror("':' cannot be used with an other "
 				    "port operator");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_port));
 			if ($$ == NULL)
 				err(1, "port_item: calloc");
 			$$->port[0] = $2.a;
 			$$->port[1] = $2.b;
 			$$->op = $1;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| portrange PORTBINARY portrange	{
 			if ($1.t || $3.t) {
 				yyerror("':' cannot be used with an other "
 				    "port operator");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_port));
 			if ($$ == NULL)
 				err(1, "port_item: calloc");
 			$$->port[0] = $1.a;
 			$$->port[1] = $3.a;
 			$$->op = $2;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		;
 
 portplain	: numberstring			{
 			if (parseport($1, &$$, 0) == -1) {
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		;
 
 portrange	: numberstring			{
 			if (parseport($1, &$$, PPORT_RANGE) == -1) {
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		;
 
 uids		: uid_item			{ $$ = $1; }
 		| '{' optnl uid_list '}'	{ $$ = $3; }
 		;
 
 uid_list	: uid_item optnl		{ $$ = $1; }
 		| uid_list comma uid_item optnl	{
 			$1->tail->next = $3;
 			$1->tail = $3;
 			$$ = $1;
 		}
 		;
 
 uid_item	: uid				{
 			$$ = calloc(1, sizeof(struct node_uid));
 			if ($$ == NULL)
 				err(1, "uid_item: calloc");
 			$$->uid[0] = $1;
 			$$->uid[1] = $1;
 			$$->op = PF_OP_EQ;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| unaryop uid			{
 			if ($2 == UID_MAX && $1 != PF_OP_EQ && $1 != PF_OP_NE) {
 				yyerror("user unknown requires operator = or "
 				    "!=");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_uid));
 			if ($$ == NULL)
 				err(1, "uid_item: calloc");
 			$$->uid[0] = $2;
 			$$->uid[1] = $2;
 			$$->op = $1;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| uid PORTBINARY uid		{
 			if ($1 == UID_MAX || $3 == UID_MAX) {
 				yyerror("user unknown requires operator = or "
 				    "!=");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_uid));
 			if ($$ == NULL)
 				err(1, "uid_item: calloc");
 			$$->uid[0] = $1;
 			$$->uid[1] = $3;
 			$$->op = $2;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		;
 
 uid		: STRING			{
 			if (!strcmp($1, "unknown"))
 				$$ = UID_MAX;
 			else {
 				struct passwd	*pw;
 
 				if ((pw = getpwnam($1)) == NULL) {
 					yyerror("unknown user %s", $1);
 					free($1);
 					YYERROR;
 				}
 				$$ = pw->pw_uid;
 			}
 			free($1);
 		}
 		| NUMBER			{
 			if ($1 < 0 || $1 >= UID_MAX) {
 				yyerror("illegal uid value %lu", $1);
 				YYERROR;
 			}
 			$$ = $1;
 		}
 		;
 
 gids		: gid_item			{ $$ = $1; }
 		| '{' optnl gid_list '}'	{ $$ = $3; }
 		;
 
 gid_list	: gid_item optnl		{ $$ = $1; }
 		| gid_list comma gid_item optnl	{
 			$1->tail->next = $3;
 			$1->tail = $3;
 			$$ = $1;
 		}
 		;
 
 gid_item	: gid				{
 			$$ = calloc(1, sizeof(struct node_gid));
 			if ($$ == NULL)
 				err(1, "gid_item: calloc");
 			$$->gid[0] = $1;
 			$$->gid[1] = $1;
 			$$->op = PF_OP_EQ;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| unaryop gid			{
 			if ($2 == GID_MAX && $1 != PF_OP_EQ && $1 != PF_OP_NE) {
 				yyerror("group unknown requires operator = or "
 				    "!=");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_gid));
 			if ($$ == NULL)
 				err(1, "gid_item: calloc");
 			$$->gid[0] = $2;
 			$$->gid[1] = $2;
 			$$->op = $1;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| gid PORTBINARY gid		{
 			if ($1 == GID_MAX || $3 == GID_MAX) {
 				yyerror("group unknown requires operator = or "
 				    "!=");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_gid));
 			if ($$ == NULL)
 				err(1, "gid_item: calloc");
 			$$->gid[0] = $1;
 			$$->gid[1] = $3;
 			$$->op = $2;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		;
 
 gid		: STRING			{
 			if (!strcmp($1, "unknown"))
 				$$ = GID_MAX;
 			else {
 				struct group	*grp;
 
 				if ((grp = getgrnam($1)) == NULL) {
 					yyerror("unknown group %s", $1);
 					free($1);
 					YYERROR;
 				}
 				$$ = grp->gr_gid;
 			}
 			free($1);
 		}
 		| NUMBER			{
 			if ($1 < 0 || $1 >= GID_MAX) {
 				yyerror("illegal gid value %lu", $1);
 				YYERROR;
 			}
 			$$ = $1;
 		}
 		;
 
 flag		: STRING			{
 			int	f;
 
 			if ((f = parse_flags($1)) < 0) {
 				yyerror("bad flags %s", $1);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 			$$.b1 = f;
 		}
 		;
 
 flags		: FLAGS flag '/' flag	{ $$.b1 = $2.b1; $$.b2 = $4.b1; }
 		| FLAGS '/' flag	{ $$.b1 = 0; $$.b2 = $3.b1; }
 		| FLAGS ANY		{ $$.b1 = 0; $$.b2 = 0; }
 		;
 
 icmpspec	: ICMPTYPE icmp_item			{ $$ = $2; }
 		| ICMPTYPE '{' optnl icmp_list '}'	{ $$ = $4; }
 		| ICMP6TYPE icmp6_item			{ $$ = $2; }
 		| ICMP6TYPE '{' optnl icmp6_list '}'	{ $$ = $4; }
 		;
 
 icmp_list	: icmp_item optnl		{ $$ = $1; }
 		| icmp_list comma icmp_item optnl {
 			$1->tail->next = $3;
 			$1->tail = $3;
 			$$ = $1;
 		}
 		;
 
 icmp6_list	: icmp6_item optnl		{ $$ = $1; }
 		| icmp6_list comma icmp6_item optnl {
 			$1->tail->next = $3;
 			$1->tail = $3;
 			$$ = $1;
 		}
 		;
 
 icmp_item	: icmptype		{
 			$$ = calloc(1, sizeof(struct node_icmp));
 			if ($$ == NULL)
 				err(1, "icmp_item: calloc");
 			$$->type = $1;
 			$$->code = 0;
 			$$->proto = IPPROTO_ICMP;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| icmptype CODE STRING	{
 			const struct icmpcodeent	*p;
 
 			if ((p = geticmpcodebyname($1-1, $3, AF_INET)) == NULL) {
 				yyerror("unknown icmp-code %s", $3);
 				free($3);
 				YYERROR;
 			}
 
 			free($3);
 			$$ = calloc(1, sizeof(struct node_icmp));
 			if ($$ == NULL)
 				err(1, "icmp_item: calloc");
 			$$->type = $1;
 			$$->code = p->code + 1;
 			$$->proto = IPPROTO_ICMP;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| icmptype CODE NUMBER	{
 			if ($3 < 0 || $3 > 255) {
 				yyerror("illegal icmp-code %lu", $3);
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_icmp));
 			if ($$ == NULL)
 				err(1, "icmp_item: calloc");
 			$$->type = $1;
 			$$->code = $3 + 1;
 			$$->proto = IPPROTO_ICMP;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		;
 
 icmp6_item	: icmp6type		{
 			$$ = calloc(1, sizeof(struct node_icmp));
 			if ($$ == NULL)
 				err(1, "icmp_item: calloc");
 			$$->type = $1;
 			$$->code = 0;
 			$$->proto = IPPROTO_ICMPV6;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| icmp6type CODE STRING	{
 			const struct icmpcodeent	*p;
 
 			if ((p = geticmpcodebyname($1-1, $3, AF_INET6)) == NULL) {
 				yyerror("unknown icmp6-code %s", $3);
 				free($3);
 				YYERROR;
 			}
 			free($3);
 
 			$$ = calloc(1, sizeof(struct node_icmp));
 			if ($$ == NULL)
 				err(1, "icmp_item: calloc");
 			$$->type = $1;
 			$$->code = p->code + 1;
 			$$->proto = IPPROTO_ICMPV6;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| icmp6type CODE NUMBER	{
 			if ($3 < 0 || $3 > 255) {
 				yyerror("illegal icmp-code %lu", $3);
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_icmp));
 			if ($$ == NULL)
 				err(1, "icmp_item: calloc");
 			$$->type = $1;
 			$$->code = $3 + 1;
 			$$->proto = IPPROTO_ICMPV6;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		;
 
 icmptype	: STRING			{
 			const struct icmptypeent	*p;
 
 			if ((p = geticmptypebyname($1, AF_INET)) == NULL) {
 				yyerror("unknown icmp-type %s", $1);
 				free($1);
 				YYERROR;
 			}
 			$$ = p->type + 1;
 			free($1);
 		}
 		| NUMBER			{
 			if ($1 < 0 || $1 > 255) {
 				yyerror("illegal icmp-type %lu", $1);
 				YYERROR;
 			}
 			$$ = $1 + 1;
 		}
 		;
 
 icmp6type	: STRING			{
 			const struct icmptypeent	*p;
 
 			if ((p = geticmptypebyname($1, AF_INET6)) ==
 			    NULL) {
 				yyerror("unknown icmp6-type %s", $1);
 				free($1);
 				YYERROR;
 			}
 			$$ = p->type + 1;
 			free($1);
 		}
 		| NUMBER			{
 			if ($1 < 0 || $1 > 255) {
 				yyerror("illegal icmp6-type %lu", $1);
 				YYERROR;
 			}
 			$$ = $1 + 1;
 		}
 		;
 
 tos	: STRING			{
 			if (!strcmp($1, "lowdelay"))
 				$$ = IPTOS_LOWDELAY;
 			else if (!strcmp($1, "throughput"))
 				$$ = IPTOS_THROUGHPUT;
 			else if (!strcmp($1, "reliability"))
 				$$ = IPTOS_RELIABILITY;
 			else if ($1[0] == '0' && $1[1] == 'x')
 				$$ = strtoul($1, NULL, 16);
 			else
 				$$ = 256;		/* flag bad argument */
 			if ($$ < 0 || $$ > 255) {
 				yyerror("illegal tos value %s", $1);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		| NUMBER			{
 			$$ = $1;
 			if ($$ < 0 || $$ > 255) {
 				yyerror("illegal tos value %s", $1);
 				YYERROR;
 			}
 		}
 		;
 
 sourcetrack	: SOURCETRACK		{ $$ = PF_SRCTRACK; }
 		| SOURCETRACK GLOBAL	{ $$ = PF_SRCTRACK_GLOBAL; }
 		| SOURCETRACK RULE	{ $$ = PF_SRCTRACK_RULE; }
 		;
 
 statelock	: IFBOUND {
 			$$ = PFRULE_IFBOUND;
 		}
 		| FLOATING {
 			$$ = 0;
 		}
 		;
 
 keep		: NO STATE			{
 			$$.action = 0;
 			$$.options = NULL;
 		}
 		| KEEP STATE state_opt_spec	{
 			$$.action = PF_STATE_NORMAL;
 			$$.options = $3;
 		}
 		| MODULATE STATE state_opt_spec {
 			$$.action = PF_STATE_MODULATE;
 			$$.options = $3;
 		}
 		| SYNPROXY STATE state_opt_spec {
 			$$.action = PF_STATE_SYNPROXY;
 			$$.options = $3;
 		}
 		;
 
 flush		: /* empty */			{ $$ = 0; }
 		| FLUSH				{ $$ = PF_FLUSH; }
 		| FLUSH GLOBAL			{
 			$$ = PF_FLUSH | PF_FLUSH_GLOBAL;
 		}
 		;
 
 state_opt_spec	: '(' state_opt_list ')'	{ $$ = $2; }
 		| /* empty */			{ $$ = NULL; }
 		;
 
 state_opt_list	: state_opt_item		{ $$ = $1; }
 		| state_opt_list comma state_opt_item {
 			$1->tail->next = $3;
 			$1->tail = $3;
 			$$ = $1;
 		}
 		;
 
 state_opt_item	: MAXIMUM NUMBER		{
 			if ($2 < 0 || $2 > UINT_MAX) {
 				yyerror("only positive values permitted");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_state_opt));
 			if ($$ == NULL)
 				err(1, "state_opt_item: calloc");
 			$$->type = PF_STATE_OPT_MAX;
 			$$->data.max_states = $2;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| NOSYNC				{
 			$$ = calloc(1, sizeof(struct node_state_opt));
 			if ($$ == NULL)
 				err(1, "state_opt_item: calloc");
 			$$->type = PF_STATE_OPT_NOSYNC;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| MAXSRCSTATES NUMBER			{
 			if ($2 < 0 || $2 > UINT_MAX) {
 				yyerror("only positive values permitted");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_state_opt));
 			if ($$ == NULL)
 				err(1, "state_opt_item: calloc");
 			$$->type = PF_STATE_OPT_MAX_SRC_STATES;
 			$$->data.max_src_states = $2;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| MAXSRCCONN NUMBER			{
 			if ($2 < 0 || $2 > UINT_MAX) {
 				yyerror("only positive values permitted");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_state_opt));
 			if ($$ == NULL)
 				err(1, "state_opt_item: calloc");
 			$$->type = PF_STATE_OPT_MAX_SRC_CONN;
 			$$->data.max_src_conn = $2;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| MAXSRCCONNRATE NUMBER '/' NUMBER	{
 			if ($2 < 0 || $2 > UINT_MAX ||
 			    $4 < 0 || $4 > UINT_MAX) {
 				yyerror("only positive values permitted");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_state_opt));
 			if ($$ == NULL)
 				err(1, "state_opt_item: calloc");
 			$$->type = PF_STATE_OPT_MAX_SRC_CONN_RATE;
 			$$->data.max_src_conn_rate.limit = $2;
 			$$->data.max_src_conn_rate.seconds = $4;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| OVERLOAD '<' STRING '>' flush		{
 			if (strlen($3) >= PF_TABLE_NAME_SIZE) {
 				yyerror("table name '%s' too long", $3);
 				free($3);
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_state_opt));
 			if ($$ == NULL)
 				err(1, "state_opt_item: calloc");
 			if (strlcpy($$->data.overload.tblname, $3,
 			    PF_TABLE_NAME_SIZE) >= PF_TABLE_NAME_SIZE)
 				errx(1, "state_opt_item: strlcpy");
 			free($3);
 			$$->type = PF_STATE_OPT_OVERLOAD;
 			$$->data.overload.flush = $5;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| MAXSRCNODES NUMBER			{
 			if ($2 < 0 || $2 > UINT_MAX) {
 				yyerror("only positive values permitted");
 				YYERROR;
 			}
 			$$ = calloc(1, sizeof(struct node_state_opt));
 			if ($$ == NULL)
 				err(1, "state_opt_item: calloc");
 			$$->type = PF_STATE_OPT_MAX_SRC_NODES;
 			$$->data.max_src_nodes = $2;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| sourcetrack {
 			$$ = calloc(1, sizeof(struct node_state_opt));
 			if ($$ == NULL)
 				err(1, "state_opt_item: calloc");
 			$$->type = PF_STATE_OPT_SRCTRACK;
 			$$->data.src_track = $1;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| statelock {
 			$$ = calloc(1, sizeof(struct node_state_opt));
 			if ($$ == NULL)
 				err(1, "state_opt_item: calloc");
 			$$->type = PF_STATE_OPT_STATELOCK;
 			$$->data.statelock = $1;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| SLOPPY {
 			$$ = calloc(1, sizeof(struct node_state_opt));
 			if ($$ == NULL)
 				err(1, "state_opt_item: calloc");
 			$$->type = PF_STATE_OPT_SLOPPY;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| STRING NUMBER			{
 			int	i;
 
 			if ($2 < 0 || $2 > UINT_MAX) {
 				yyerror("only positive values permitted");
 				YYERROR;
 			}
 			for (i = 0; pf_timeouts[i].name &&
 			    strcmp(pf_timeouts[i].name, $1); ++i)
 				;	/* nothing */
 			if (!pf_timeouts[i].name) {
 				yyerror("illegal timeout name %s", $1);
 				free($1);
 				YYERROR;
 			}
 			if (strchr(pf_timeouts[i].name, '.') == NULL) {
 				yyerror("illegal state timeout %s", $1);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 			$$ = calloc(1, sizeof(struct node_state_opt));
 			if ($$ == NULL)
 				err(1, "state_opt_item: calloc");
 			$$->type = PF_STATE_OPT_TIMEOUT;
 			$$->data.timeout.number = pf_timeouts[i].timeout;
 			$$->data.timeout.seconds = $2;
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		;
 
 label		: LABEL STRING			{
 			$$ = $2;
 		}
 		;
 
 qname		: QUEUE STRING				{
 			$$.qname = $2;
 			$$.pqname = NULL;
 		}
 		| QUEUE '(' STRING ')'			{
 			$$.qname = $3;
 			$$.pqname = NULL;
 		}
 		| QUEUE '(' STRING comma STRING ')'	{
 			$$.qname = $3;
 			$$.pqname = $5;
 		}
 		;
 
 no		: /* empty */			{ $$ = 0; }
 		| NO				{ $$ = 1; }
 		;
 
 portstar	: numberstring			{
 			if (parseport($1, &$$, PPORT_RANGE|PPORT_STAR) == -1) {
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		;
 
 redirspec	: host				{ $$ = $1; }
 		| '{' optnl redir_host_list '}'	{ $$ = $3; }
 		;
 
 redir_host_list	: host optnl			{ $$ = $1; }
 		| redir_host_list comma host optnl {
 			$1->tail->next = $3;
 			$1->tail = $3->tail;
 			$$ = $1;
 		}
 		;
 
 redirpool	: /* empty */			{ $$ = NULL; }
 		| ARROW redirspec		{
 			$$ = calloc(1, sizeof(struct redirection));
 			if ($$ == NULL)
 				err(1, "redirection: calloc");
 			$$->host = $2;
 			$$->rport.a = $$->rport.b = $$->rport.t = 0;
 		}
 		| ARROW redirspec PORT portstar	{
 			$$ = calloc(1, sizeof(struct redirection));
 			if ($$ == NULL)
 				err(1, "redirection: calloc");
 			$$->host = $2;
 			$$->rport = $4;
 		}
 		;
 
 hashkey		: /* empty */
 		{
 			$$ = calloc(1, sizeof(struct pf_poolhashkey));
 			if ($$ == NULL)
 				err(1, "hashkey: calloc");
 			$$->key32[0] = arc4random();
 			$$->key32[1] = arc4random();
 			$$->key32[2] = arc4random();
 			$$->key32[3] = arc4random();
 		}
 		| string
 		{
 			if (!strncmp($1, "0x", 2)) {
 				if (strlen($1) != 34) {
 					free($1);
 					yyerror("hex key must be 128 bits "
 						"(32 hex digits) long");
 					YYERROR;
 				}
 				$$ = calloc(1, sizeof(struct pf_poolhashkey));
 				if ($$ == NULL)
 					err(1, "hashkey: calloc");
 
 				if (sscanf($1, "0x%8x%8x%8x%8x",
 				    &$$->key32[0], &$$->key32[1],
 				    &$$->key32[2], &$$->key32[3]) != 4) {
 					free($$);
 					free($1);
 					yyerror("invalid hex key");
 					YYERROR;
 				}
 			} else {
 				MD5_CTX	context;
 
 				$$ = calloc(1, sizeof(struct pf_poolhashkey));
 				if ($$ == NULL)
 					err(1, "hashkey: calloc");
 				MD5Init(&context);
 				MD5Update(&context, (unsigned char *)$1,
 				    strlen($1));
 				MD5Final((unsigned char *)$$, &context);
 				HTONL($$->key32[0]);
 				HTONL($$->key32[1]);
 				HTONL($$->key32[2]);
 				HTONL($$->key32[3]);
 			}
 			free($1);
 		}
 		;
 
 pool_opts	:	{ bzero(&pool_opts, sizeof pool_opts); }
 		    pool_opts_l
 			{ $$ = pool_opts; }
 		| /* empty */	{
 			bzero(&pool_opts, sizeof pool_opts);
 			$$ = pool_opts;
 		}
 		;
 
 pool_opts_l	: pool_opts_l pool_opt
 		| pool_opt
 		;
 
 pool_opt	: BITMASK	{
 			if (pool_opts.type) {
 				yyerror("pool type cannot be redefined");
 				YYERROR;
 			}
 			pool_opts.type =  PF_POOL_BITMASK;
 		}
 		| RANDOM	{
 			if (pool_opts.type) {
 				yyerror("pool type cannot be redefined");
 				YYERROR;
 			}
 			pool_opts.type = PF_POOL_RANDOM;
 		}
 		| SOURCEHASH hashkey {
 			if (pool_opts.type) {
 				yyerror("pool type cannot be redefined");
 				YYERROR;
 			}
 			pool_opts.type = PF_POOL_SRCHASH;
 			pool_opts.key = $2;
 		}
 		| ROUNDROBIN	{
 			if (pool_opts.type) {
 				yyerror("pool type cannot be redefined");
 				YYERROR;
 			}
 			pool_opts.type = PF_POOL_ROUNDROBIN;
 		}
 		| STATICPORT	{
 			if (pool_opts.staticport) {
 				yyerror("static-port cannot be redefined");
 				YYERROR;
 			}
 			pool_opts.staticport = 1;
 		}
 		| STICKYADDRESS	{
 			if (filter_opts.marker & POM_STICKYADDRESS) {
 				yyerror("sticky-address cannot be redefined");
 				YYERROR;
 			}
 			pool_opts.marker |= POM_STICKYADDRESS;
 			pool_opts.opts |= PF_POOL_STICKYADDR;
 		}
 		;
 
 redirection	: /* empty */			{ $$ = NULL; }
 		| ARROW host			{
 			$$ = calloc(1, sizeof(struct redirection));
 			if ($$ == NULL)
 				err(1, "redirection: calloc");
 			$$->host = $2;
 			$$->rport.a = $$->rport.b = $$->rport.t = 0;
 		}
 		| ARROW host PORT portstar	{
 			$$ = calloc(1, sizeof(struct redirection));
 			if ($$ == NULL)
 				err(1, "redirection: calloc");
 			$$->host = $2;
 			$$->rport = $4;
 		}
 		;
 
 natpasslog	: /* empty */	{ $$.b1 = $$.b2 = 0; $$.w2 = 0; }
 		| PASS		{ $$.b1 = 1; $$.b2 = 0; $$.w2 = 0; }
 		| PASS log	{ $$.b1 = 1; $$.b2 = $2.log; $$.w2 = $2.logif; }
 		| log		{ $$.b1 = 0; $$.b2 = $1.log; $$.w2 = $1.logif; }
 		;
 
 nataction	: no NAT natpasslog {
 			if ($1 && $3.b1) {
 				yyerror("\"pass\" not valid with \"no\"");
 				YYERROR;
 			}
 			if ($1)
 				$$.b1 = PF_NONAT;
 			else
 				$$.b1 = PF_NAT;
 			$$.b2 = $3.b1;
 			$$.w = $3.b2;
 			$$.w2 = $3.w2;
 		}
 		| no RDR natpasslog {
 			if ($1 && $3.b1) {
 				yyerror("\"pass\" not valid with \"no\"");
 				YYERROR;
 			}
 			if ($1)
 				$$.b1 = PF_NORDR;
 			else
 				$$.b1 = PF_RDR;
 			$$.b2 = $3.b1;
 			$$.w = $3.b2;
 			$$.w2 = $3.w2;
 		}
 		;
 
 natrule		: nataction interface af proto fromto tag tagged rtable
 		    redirpool pool_opts
 		{
 			struct pf_rule	r;
 
 			if (check_rulestate(PFCTL_STATE_NAT))
 				YYERROR;
 
 			memset(&r, 0, sizeof(r));
 
 			r.action = $1.b1;
 			r.natpass = $1.b2;
 			r.log = $1.w;
 			r.logif = $1.w2;
 			r.af = $3;
 
 			if (!r.af) {
 				if ($5.src.host && $5.src.host->af &&
 				    !$5.src.host->ifindex)
 					r.af = $5.src.host->af;
 				else if ($5.dst.host && $5.dst.host->af &&
 				    !$5.dst.host->ifindex)
 					r.af = $5.dst.host->af;
 			}
 
 			if ($6 != NULL)
 				if (strlcpy(r.tagname, $6, PF_TAG_NAME_SIZE) >=
 				    PF_TAG_NAME_SIZE) {
 					yyerror("tag too long, max %u chars",
 					    PF_TAG_NAME_SIZE - 1);
 					YYERROR;
 				}
 
 			if ($7.name)
 				if (strlcpy(r.match_tagname, $7.name,
 				    PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) {
 					yyerror("tag too long, max %u chars",
 					    PF_TAG_NAME_SIZE - 1);
 					YYERROR;
 				}
 			r.match_tag_not = $7.neg;
 			r.rtableid = $8;
 
 			if (r.action == PF_NONAT || r.action == PF_NORDR) {
 				if ($9 != NULL) {
 					yyerror("translation rule with 'no' "
 					    "does not need '->'");
 					YYERROR;
 				}
 			} else {
 				if ($9 == NULL || $9->host == NULL) {
 					yyerror("translation rule requires '-> "
 					    "address'");
 					YYERROR;
 				}
 				if (!r.af && ! $9->host->ifindex)
 					r.af = $9->host->af;
 
 				remove_invalid_hosts(&$9->host, &r.af);
 				if (invalid_redirect($9->host, r.af))
 					YYERROR;
 				if (check_netmask($9->host, r.af))
 					YYERROR;
 
 				r.rpool.proxy_port[0] = ntohs($9->rport.a);
 
 				switch (r.action) {
 				case PF_RDR:
 					if (!$9->rport.b && $9->rport.t &&
 					    $5.dst.port != NULL) {
 						r.rpool.proxy_port[1] =
 						    ntohs($9->rport.a) +
 						    (ntohs(
 						    $5.dst.port->port[1]) -
 						    ntohs(
 						    $5.dst.port->port[0]));
 					} else
 						r.rpool.proxy_port[1] =
 						    ntohs($9->rport.b);
 					break;
 				case PF_NAT:
 					r.rpool.proxy_port[1] =
 					    ntohs($9->rport.b);
 					if (!r.rpool.proxy_port[0] &&
 					    !r.rpool.proxy_port[1]) {
 						r.rpool.proxy_port[0] =
 						    PF_NAT_PROXY_PORT_LOW;
 						r.rpool.proxy_port[1] =
 						    PF_NAT_PROXY_PORT_HIGH;
 					} else if (!r.rpool.proxy_port[1])
 						r.rpool.proxy_port[1] =
 						    r.rpool.proxy_port[0];
 					break;
 				default:
 					break;
 				}
 
 				r.rpool.opts = $10.type;
 				if ((r.rpool.opts & PF_POOL_TYPEMASK) ==
 				    PF_POOL_NONE && ($9->host->next != NULL ||
 				    $9->host->addr.type == PF_ADDR_TABLE ||
 				    DYNIF_MULTIADDR($9->host->addr)))
 					r.rpool.opts = PF_POOL_ROUNDROBIN;
 				if ((r.rpool.opts & PF_POOL_TYPEMASK) !=
 				    PF_POOL_ROUNDROBIN &&
 				    disallow_table($9->host, "tables are only "
 				    "supported in round-robin redirection "
 				    "pools"))
 					YYERROR;
 				if ((r.rpool.opts & PF_POOL_TYPEMASK) !=
 				    PF_POOL_ROUNDROBIN &&
 				    disallow_alias($9->host, "interface (%s) "
 				    "is only supported in round-robin "
 				    "redirection pools"))
 					YYERROR;
 				if ($9->host->next != NULL) {
 					if ((r.rpool.opts & PF_POOL_TYPEMASK) !=
 					    PF_POOL_ROUNDROBIN) {
 						yyerror("only round-robin "
 						    "valid for multiple "
 						    "redirection addresses");
 						YYERROR;
 					}
 				}
 			}
 
 			if ($10.key != NULL)
 				memcpy(&r.rpool.key, $10.key,
 				    sizeof(struct pf_poolhashkey));
 
 			 if ($10.opts)
 				r.rpool.opts |= $10.opts;
 
 			if ($10.staticport) {
 				if (r.action != PF_NAT) {
 					yyerror("the 'static-port' option is "
 					    "only valid with nat rules");
 					YYERROR;
 				}
 				if (r.rpool.proxy_port[0] !=
 				    PF_NAT_PROXY_PORT_LOW &&
 				    r.rpool.proxy_port[1] !=
 				    PF_NAT_PROXY_PORT_HIGH) {
 					yyerror("the 'static-port' option can't"
 					    " be used when specifying a port"
 					    " range");
 					YYERROR;
 				}
 				r.rpool.proxy_port[0] = 0;
 				r.rpool.proxy_port[1] = 0;
 			}
 
 			expand_rule(&r, $2, $9 == NULL ? NULL : $9->host, $4,
 			    $5.src_os, $5.src.host, $5.src.port, $5.dst.host,
 			    $5.dst.port, 0, 0, 0, "");
 			free($9);
 		}
 		;
 
 binatrule	: no BINAT natpasslog interface af proto FROM host toipspec tag
 		    tagged rtable redirection
 		{
 			struct pf_rule		binat;
 			struct pf_pooladdr	*pa;
 
 			if (check_rulestate(PFCTL_STATE_NAT))
 				YYERROR;
 			if (disallow_urpf_failed($9, "\"urpf-failed\" is not "
 			    "permitted as a binat destination"))
 				YYERROR;
 
 			memset(&binat, 0, sizeof(binat));
 
 			if ($1 && $3.b1) {
 				yyerror("\"pass\" not valid with \"no\"");
 				YYERROR;
 			}
 			if ($1)
 				binat.action = PF_NOBINAT;
 			else
 				binat.action = PF_BINAT;
 			binat.natpass = $3.b1;
 			binat.log = $3.b2;
 			binat.logif = $3.w2;
 			binat.af = $5;
 			if (!binat.af && $8 != NULL && $8->af)
 				binat.af = $8->af;
 			if (!binat.af && $9 != NULL && $9->af)
 				binat.af = $9->af;
 
 			if (!binat.af && $13 != NULL && $13->host)
 				binat.af = $13->host->af;
 			if (!binat.af) {
 				yyerror("address family (inet/inet6) "
 				    "undefined");
 				YYERROR;
 			}
 
 			if ($4 != NULL) {
 				memcpy(binat.ifname, $4->ifname,
 				    sizeof(binat.ifname));
 				binat.ifnot = $4->not;
 				free($4);
 			}
 
 			if ($10 != NULL)
 				if (strlcpy(binat.tagname, $10,
 				    PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) {
 					yyerror("tag too long, max %u chars",
 					    PF_TAG_NAME_SIZE - 1);
 					YYERROR;
 				}
 			if ($11.name)
 				if (strlcpy(binat.match_tagname, $11.name,
 				    PF_TAG_NAME_SIZE) >= PF_TAG_NAME_SIZE) {
 					yyerror("tag too long, max %u chars",
 					    PF_TAG_NAME_SIZE - 1);
 					YYERROR;
 				}
 			binat.match_tag_not = $11.neg;
 			binat.rtableid = $12;
 
 			if ($6 != NULL) {
 				binat.proto = $6->proto;
 				free($6);
 			}
 
 			if ($8 != NULL && disallow_table($8, "invalid use of "
 			    "table <%s> as the source address of a binat rule"))
 				YYERROR;
 			if ($8 != NULL && disallow_alias($8, "invalid use of "
 			    "interface (%s) as the source address of a binat "
 			    "rule"))
 				YYERROR;
 			if ($13 != NULL && $13->host != NULL && disallow_table(
 			    $13->host, "invalid use of table <%s> as the "
 			    "redirect address of a binat rule"))
 				YYERROR;
 			if ($13 != NULL && $13->host != NULL && disallow_alias(
 			    $13->host, "invalid use of interface (%s) as the "
 			    "redirect address of a binat rule"))
 				YYERROR;
 
 			if ($8 != NULL) {
 				if ($8->next) {
 					yyerror("multiple binat ip addresses");
 					YYERROR;
 				}
 				if ($8->addr.type == PF_ADDR_DYNIFTL)
 					$8->af = binat.af;
 				if ($8->af != binat.af) {
 					yyerror("binat ip versions must match");
 					YYERROR;
 				}
 				if (check_netmask($8, binat.af))
 					YYERROR;
 				memcpy(&binat.src.addr, &$8->addr,
 				    sizeof(binat.src.addr));
 				free($8);
 			}
 			if ($9 != NULL) {
 				if ($9->next) {
 					yyerror("multiple binat ip addresses");
 					YYERROR;
 				}
 				if ($9->af != binat.af && $9->af) {
 					yyerror("binat ip versions must match");
 					YYERROR;
 				}
 				if (check_netmask($9, binat.af))
 					YYERROR;
 				memcpy(&binat.dst.addr, &$9->addr,
 				    sizeof(binat.dst.addr));
 				binat.dst.neg = $9->not;
 				free($9);
 			}
 
 			if (binat.action == PF_NOBINAT) {
 				if ($13 != NULL) {
 					yyerror("'no binat' rule does not need"
 					    " '->'");
 					YYERROR;
 				}
 			} else {
 				if ($13 == NULL || $13->host == NULL) {
 					yyerror("'binat' rule requires"
 					    " '-> address'");
 					YYERROR;
 				}
 
 				remove_invalid_hosts(&$13->host, &binat.af);
 				if (invalid_redirect($13->host, binat.af))
 					YYERROR;
 				if ($13->host->next != NULL) {
 					yyerror("binat rule must redirect to "
 					    "a single address");
 					YYERROR;
 				}
 				if (check_netmask($13->host, binat.af))
 					YYERROR;
 
 				if (!PF_AZERO(&binat.src.addr.v.a.mask,
 				    binat.af) &&
 				    !PF_AEQ(&binat.src.addr.v.a.mask,
 				    &$13->host->addr.v.a.mask, binat.af)) {
 					yyerror("'binat' source mask and "
 					    "redirect mask must be the same");
 					YYERROR;
 				}
 
 				TAILQ_INIT(&binat.rpool.list);
 				pa = calloc(1, sizeof(struct pf_pooladdr));
 				if (pa == NULL)
 					err(1, "binat: calloc");
 				pa->addr = $13->host->addr;
 				pa->ifname[0] = 0;
 				TAILQ_INSERT_TAIL(&binat.rpool.list,
 				    pa, entries);
 
 				free($13);
 			}
 
 			pfctl_add_rule(pf, &binat, "");
 		}
 		;
 
 tag		: /* empty */		{ $$ = NULL; }
 		| TAG STRING		{ $$ = $2; }
 		;
 
 tagged		: /* empty */		{ $$.neg = 0; $$.name = NULL; }
 		| not TAGGED string	{ $$.neg = $1; $$.name = $3; }
 		;
 
 rtable		: /* empty */		{ $$ = -1; }
 		| RTABLE NUMBER		{
 			if ($2 < 0 || $2 > rt_tableid_max()) {
 				yyerror("invalid rtable id");
 				YYERROR;
 			}
 			$$ = $2;
 		}
 		;
 
 route_host	: STRING			{
 			$$ = calloc(1, sizeof(struct node_host));
 			if ($$ == NULL)
 				err(1, "route_host: calloc");
 			$$->ifname = $1;
 			set_ipmask($$, 128);
 			$$->next = NULL;
 			$$->tail = $$;
 		}
 		| '(' STRING host ')'		{
 			$$ = $3;
 			$$->ifname = $2;
 		}
 		;
 
 route_host_list	: route_host optnl			{ $$ = $1; }
 		| route_host_list comma route_host optnl {
 			if ($1->af == 0)
 				$1->af = $3->af;
 			if ($1->af != $3->af) {
 				yyerror("all pool addresses must be in the "
 				    "same address family");
 				YYERROR;
 			}
 			$1->tail->next = $3;
 			$1->tail = $3->tail;
 			$$ = $1;
 		}
 		;
 
 routespec	: route_host			{ $$ = $1; }
 		| '{' optnl route_host_list '}'	{ $$ = $3; }
 		;
 
 route		: /* empty */			{
 			$$.host = NULL;
 			$$.rt = 0;
 			$$.pool_opts = 0;
 		}
 		| FASTROUTE {
 			$$.host = NULL;
 			$$.rt = PF_FASTROUTE;
 			$$.pool_opts = 0;
 		}
 		| ROUTETO routespec pool_opts {
 			$$.host = $2;
 			$$.rt = PF_ROUTETO;
 			$$.pool_opts = $3.type | $3.opts;
 			if ($3.key != NULL)
 				$$.key = $3.key;
 		}
 		| REPLYTO routespec pool_opts {
 			$$.host = $2;
 			$$.rt = PF_REPLYTO;
 			$$.pool_opts = $3.type | $3.opts;
 			if ($3.key != NULL)
 				$$.key = $3.key;
 		}
 		| DUPTO routespec pool_opts {
 			$$.host = $2;
 			$$.rt = PF_DUPTO;
 			$$.pool_opts = $3.type | $3.opts;
 			if ($3.key != NULL)
 				$$.key = $3.key;
 		}
 		;
 
 timeout_spec	: STRING NUMBER
 		{
 			if (check_rulestate(PFCTL_STATE_OPTION)) {
 				free($1);
 				YYERROR;
 			}
 			if ($2 < 0 || $2 > UINT_MAX) {
 				yyerror("only positive values permitted");
 				YYERROR;
 			}
 			if (pfctl_set_timeout(pf, $1, $2, 0) != 0) {
 				yyerror("unknown timeout %s", $1);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
+		| INTERVAL NUMBER		{
+			if (check_rulestate(PFCTL_STATE_OPTION))
+				YYERROR;
+			if ($2 < 0 || $2 > UINT_MAX) {
+				yyerror("only positive values permitted");
+				YYERROR;
+			}
+			if (pfctl_set_timeout(pf, "interval", $2, 0) != 0)
+				YYERROR;
+		}
 		;
 
 timeout_list	: timeout_list comma timeout_spec optnl
 		| timeout_spec optnl
 		;
 
 limit_spec	: STRING NUMBER
 		{
 			if (check_rulestate(PFCTL_STATE_OPTION)) {
 				free($1);
 				YYERROR;
 			}
 			if ($2 < 0 || $2 > UINT_MAX) {
 				yyerror("only positive values permitted");
 				YYERROR;
 			}
 			if (pfctl_set_limit(pf, $1, $2) != 0) {
 				yyerror("unable to set limit %s %u", $1, $2);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		;
 
 limit_list	: limit_list comma limit_spec optnl
 		| limit_spec optnl
 		;
 
 comma		: ','
 		| /* empty */
 		;
 
 yesno		: NO			{ $$ = 0; }
 		| STRING		{
 			if (!strcmp($1, "yes"))
 				$$ = 1;
 			else {
 				yyerror("invalid value '%s', expected 'yes' "
 				    "or 'no'", $1);
 				free($1);
 				YYERROR;
 			}
 			free($1);
 		}
 		;
 
 unaryop		: '='		{ $$ = PF_OP_EQ; }
 		| '!' '='	{ $$ = PF_OP_NE; }
 		| '<' '='	{ $$ = PF_OP_LE; }
 		| '<'		{ $$ = PF_OP_LT; }
 		| '>' '='	{ $$ = PF_OP_GE; }
 		| '>'		{ $$ = PF_OP_GT; }
 		;
 
 %%
 
 int
 yyerror(const char *fmt, ...)
 {
 	va_list		 ap;
 
 	file->errors++;
 	va_start(ap, fmt);
 	fprintf(stderr, "%s:%d: ", file->name, yylval.lineno);
 	vfprintf(stderr, fmt, ap);
 	fprintf(stderr, "\n");
 	va_end(ap);
 	return (0);
 }
 
 int
 disallow_table(struct node_host *h, const char *fmt)
 {
 	for (; h != NULL; h = h->next)
 		if (h->addr.type == PF_ADDR_TABLE) {
 			yyerror(fmt, h->addr.v.tblname);
 			return (1);
 		}
 	return (0);
 }
 
 int
 disallow_urpf_failed(struct node_host *h, const char *fmt)
 {
 	for (; h != NULL; h = h->next)
 		if (h->addr.type == PF_ADDR_URPFFAILED) {
 			yyerror(fmt);
 			return (1);
 		}
 	return (0);
 }
 
 int
 disallow_alias(struct node_host *h, const char *fmt)
 {
 	for (; h != NULL; h = h->next)
 		if (DYNIF_MULTIADDR(h->addr)) {
 			yyerror(fmt, h->addr.v.tblname);
 			return (1);
 		}
 	return (0);
 }
 
 int
 rule_consistent(struct pf_rule *r, int anchor_call)
 {
 	int	problems = 0;
 
 	switch (r->action) {
 	case PF_PASS:
 	case PF_DROP:
 	case PF_SCRUB:
 	case PF_NOSCRUB:
 		problems = filter_consistent(r, anchor_call);
 		break;
 	case PF_NAT:
 	case PF_NONAT:
 		problems = nat_consistent(r);
 		break;
 	case PF_RDR:
 	case PF_NORDR:
 		problems = rdr_consistent(r);
 		break;
 	case PF_BINAT:
 	case PF_NOBINAT:
 	default:
 		break;
 	}
 	return (problems);
 }
 
 int
 filter_consistent(struct pf_rule *r, int anchor_call)
 {
 	int	problems = 0;
 
 	if (r->proto != IPPROTO_TCP && r->proto != IPPROTO_UDP &&
 	    (r->src.port_op || r->dst.port_op)) {
 		yyerror("port only applies to tcp/udp");
 		problems++;
 	}
 	if (r->proto != IPPROTO_ICMP && r->proto != IPPROTO_ICMPV6 &&
 	    (r->type || r->code)) {
 		yyerror("icmp-type/code only applies to icmp");
 		problems++;
 	}
 	if (!r->af && (r->type || r->code)) {
 		yyerror("must indicate address family with icmp-type/code");
 		problems++;
 	}
 	if (r->overload_tblname[0] &&
 	    r->max_src_conn == 0 && r->max_src_conn_rate.seconds == 0) {
 		yyerror("'overload' requires 'max-src-conn' "
 		    "or 'max-src-conn-rate'");
 		problems++;
 	}
 	if ((r->proto == IPPROTO_ICMP && r->af == AF_INET6) ||
 	    (r->proto == IPPROTO_ICMPV6 && r->af == AF_INET)) {
 		yyerror("proto %s doesn't match address family %s",
 		    r->proto == IPPROTO_ICMP ? "icmp" : "icmp6",
 		    r->af == AF_INET ? "inet" : "inet6");
 		problems++;
 	}
 	if (r->allow_opts && r->action != PF_PASS) {
 		yyerror("allow-opts can only be specified for pass rules");
 		problems++;
 	}
 	if (r->rule_flag & PFRULE_FRAGMENT && (r->src.port_op ||
 	    r->dst.port_op || r->flagset || r->type || r->code)) {
 		yyerror("fragments can be filtered only on IP header fields");
 		problems++;
 	}
 	if (r->rule_flag & PFRULE_RETURNRST && r->proto != IPPROTO_TCP) {
 		yyerror("return-rst can only be applied to TCP rules");
 		problems++;
 	}
 	if (r->max_src_nodes && !(r->rule_flag & PFRULE_RULESRCTRACK)) {
 		yyerror("max-src-nodes requires 'source-track rule'");
 		problems++;
 	}
 	if (r->action == PF_DROP && r->keep_state) {
 		yyerror("keep state on block rules doesn't make sense");
 		problems++;
 	}
 	if (r->rule_flag & PFRULE_STATESLOPPY &&
 	    (r->keep_state == PF_STATE_MODULATE ||
 	    r->keep_state == PF_STATE_SYNPROXY)) {
 		yyerror("sloppy state matching cannot be used with "
 		    "synproxy state or modulate state");
 		problems++;
 	}
 	return (-problems);
 }
 
 int
 nat_consistent(struct pf_rule *r)
 {
 	return (0);	/* yeah! */
 }
 
 int
 rdr_consistent(struct pf_rule *r)
 {
 	int			 problems = 0;
 
 	if (r->proto != IPPROTO_TCP && r->proto != IPPROTO_UDP) {
 		if (r->src.port_op) {
 			yyerror("src port only applies to tcp/udp");
 			problems++;
 		}
 		if (r->dst.port_op) {
 			yyerror("dst port only applies to tcp/udp");
 			problems++;
 		}
 		if (r->rpool.proxy_port[0]) {
 			yyerror("rpool port only applies to tcp/udp");
 			problems++;
 		}
 	}
 	if (r->dst.port_op &&
 	    r->dst.port_op != PF_OP_EQ && r->dst.port_op != PF_OP_RRG) {
 		yyerror("invalid port operator for rdr destination port");
 		problems++;
 	}
 	return (-problems);
 }
 
 int
 process_tabledef(char *name, struct table_opts *opts)
 {
 	struct pfr_buffer	 ab;
 	struct node_tinit	*ti;
 
 	bzero(&ab, sizeof(ab));
 	ab.pfrb_type = PFRB_ADDRS;
 	SIMPLEQ_FOREACH(ti, &opts->init_nodes, entries) {
 		if (ti->file)
 			if (pfr_buf_load(&ab, ti->file, 0, append_addr)) {
 				if (errno)
 					yyerror("cannot load \"%s\": %s",
 					    ti->file, strerror(errno));
 				else
 					yyerror("file \"%s\" contains bad data",
 					    ti->file);
 				goto _error;
 			}
 		if (ti->host)
 			if (append_addr_host(&ab, ti->host, 0, 0)) {
 				yyerror("cannot create address buffer: %s",
 				    strerror(errno));
 				goto _error;
 			}
 	}
 	if (pf->opts & PF_OPT_VERBOSE)
 		print_tabledef(name, opts->flags, opts->init_addr,
 		    &opts->init_nodes);
 	if (!(pf->opts & PF_OPT_NOACTION) &&
 	    pfctl_define_table(name, opts->flags, opts->init_addr,
 	    pf->anchor->name, &ab, pf->anchor->ruleset.tticket)) {
 		yyerror("cannot define table %s: %s", name,
 		    pfr_strerror(errno));
 		goto _error;
 	}
 	pf->tdirty = 1;
 	pfr_buf_clear(&ab);
 	return (0);
 _error:
 	pfr_buf_clear(&ab);
 	return (-1);
 }
 
 struct keywords {
 	const char	*k_name;
 	int		 k_val;
 };
 
 /* macro gore, but you should've seen the prior indentation nightmare... */
 
 #define FREE_LIST(T,r) \
 	do { \
 		T *p, *node = r; \
 		while (node != NULL) { \
 			p = node; \
 			node = node->next; \
 			free(p); \
 		} \
 	} while (0)
 
 #define LOOP_THROUGH(T,n,r,C) \
 	do { \
 		T *n; \
 		if (r == NULL) { \
 			r = calloc(1, sizeof(T)); \
 			if (r == NULL) \
 				err(1, "LOOP: calloc"); \
 			r->next = NULL; \
 		} \
 		n = r; \
 		while (n != NULL) { \
 			do { \
 				C; \
 			} while (0); \
 			n = n->next; \
 		} \
 	} while (0)
 
 void
 expand_label_str(char *label, size_t len, const char *srch, const char *repl)
 {
 	char *tmp;
 	char *p, *q;
 
 	if ((tmp = calloc(1, len)) == NULL)
 		err(1, "expand_label_str: calloc");
 	p = q = label;
 	while ((q = strstr(p, srch)) != NULL) {
 		*q = '\0';
 		if ((strlcat(tmp, p, len) >= len) ||
 		    (strlcat(tmp, repl, len) >= len))
 			errx(1, "expand_label: label too long");
 		q += strlen(srch);
 		p = q;
 	}
 	if (strlcat(tmp, p, len) >= len)
 		errx(1, "expand_label: label too long");
 	strlcpy(label, tmp, len);	/* always fits */
 	free(tmp);
 }
 
 void
 expand_label_if(const char *name, char *label, size_t len, const char *ifname)
 {
 	if (strstr(label, name) != NULL) {
 		if (!*ifname)
 			expand_label_str(label, len, name, "any");
 		else
 			expand_label_str(label, len, name, ifname);
 	}
 }
 
 void
 expand_label_addr(const char *name, char *label, size_t len, sa_family_t af,
     struct node_host *h)
 {
 	char tmp[64], tmp_not[66];
 
 	if (strstr(label, name) != NULL) {
 		switch (h->addr.type) {
 		case PF_ADDR_DYNIFTL:
 			snprintf(tmp, sizeof(tmp), "(%s)", h->addr.v.ifname);
 			break;
 		case PF_ADDR_TABLE:
 			snprintf(tmp, sizeof(tmp), "<%s>", h->addr.v.tblname);
 			break;
 		case PF_ADDR_NOROUTE:
 			snprintf(tmp, sizeof(tmp), "no-route");
 			break;
 		case PF_ADDR_URPFFAILED:
 			snprintf(tmp, sizeof(tmp), "urpf-failed");
 			break;
 		case PF_ADDR_ADDRMASK:
 			if (!af || (PF_AZERO(&h->addr.v.a.addr, af) &&
 			    PF_AZERO(&h->addr.v.a.mask, af)))
 				snprintf(tmp, sizeof(tmp), "any");
 			else {
 				char	a[48];
 				int	bits;
 
 				if (inet_ntop(af, &h->addr.v.a.addr, a,
 				    sizeof(a)) == NULL)
 					snprintf(tmp, sizeof(tmp), "?");
 				else {
 					bits = unmask(&h->addr.v.a.mask, af);
 					if ((af == AF_INET && bits < 32) ||
 					    (af == AF_INET6 && bits < 128))
 						snprintf(tmp, sizeof(tmp),
 						    "%s/%d", a, bits);
 					else
 						snprintf(tmp, sizeof(tmp),
 						    "%s", a);
 				}
 			}
 			break;
 		default:
 			snprintf(tmp, sizeof(tmp), "?");
 			break;
 		}
 
 		if (h->not) {
 			snprintf(tmp_not, sizeof(tmp_not), "! %s", tmp);
 			expand_label_str(label, len, name, tmp_not);
 		} else
 			expand_label_str(label, len, name, tmp);
 	}
 }
 
 void
 expand_label_port(const char *name, char *label, size_t len,
     struct node_port *port)
 {
 	char	 a1[6], a2[6], op[13] = "";
 
 	if (strstr(label, name) != NULL) {
 		snprintf(a1, sizeof(a1), "%u", ntohs(port->port[0]));
 		snprintf(a2, sizeof(a2), "%u", ntohs(port->port[1]));
 		if (!port->op)
 			;
 		else if (port->op == PF_OP_IRG)
 			snprintf(op, sizeof(op), "%s><%s", a1, a2);
 		else if (port->op == PF_OP_XRG)
 			snprintf(op, sizeof(op), "%s<>%s", a1, a2);
 		else if (port->op == PF_OP_EQ)
 			snprintf(op, sizeof(op), "%s", a1);
 		else if (port->op == PF_OP_NE)
 			snprintf(op, sizeof(op), "!=%s", a1);
 		else if (port->op == PF_OP_LT)
 			snprintf(op, sizeof(op), "<%s", a1);
 		else if (port->op == PF_OP_LE)
 			snprintf(op, sizeof(op), "<=%s", a1);
 		else if (port->op == PF_OP_GT)
 			snprintf(op, sizeof(op), ">%s", a1);
 		else if (port->op == PF_OP_GE)
 			snprintf(op, sizeof(op), ">=%s", a1);
 		expand_label_str(label, len, name, op);
 	}
 }
 
 void
 expand_label_proto(const char *name, char *label, size_t len, u_int8_t proto)
 {
 	struct protoent *pe;
 	char n[4];
 
 	if (strstr(label, name) != NULL) {
 		pe = getprotobynumber(proto);
 		if (pe != NULL)
 			expand_label_str(label, len, name, pe->p_name);
 		else {
 			snprintf(n, sizeof(n), "%u", proto);
 			expand_label_str(label, len, name, n);
 		}
 	}
 }
 
 void
 expand_label_nr(const char *name, char *label, size_t len)
 {
 	char n[11];
 
 	if (strstr(label, name) != NULL) {
 		snprintf(n, sizeof(n), "%u", pf->anchor->match);
 		expand_label_str(label, len, name, n);
 	}
 }
 
 void
 expand_label(char *label, size_t len, const char *ifname, sa_family_t af,
     struct node_host *src_host, struct node_port *src_port,
     struct node_host *dst_host, struct node_port *dst_port,
     u_int8_t proto)
 {
 	expand_label_if("$if", label, len, ifname);
 	expand_label_addr("$srcaddr", label, len, af, src_host);
 	expand_label_addr("$dstaddr", label, len, af, dst_host);
 	expand_label_port("$srcport", label, len, src_port);
 	expand_label_port("$dstport", label, len, dst_port);
 	expand_label_proto("$proto", label, len, proto);
 	expand_label_nr("$nr", label, len);
 }
 
 int
 expand_altq(struct pf_altq *a, struct node_if *interfaces,
     struct node_queue *nqueues, struct node_queue_bw bwspec,
     struct node_queue_opt *opts)
 {
 	struct pf_altq		 pa, pb;
 	char			 qname[PF_QNAME_SIZE];
 	struct node_queue	*n;
 	struct node_queue_bw	 bw;
 	int			 errs = 0;
 
 	if ((pf->loadopt & PFCTL_FLAG_ALTQ) == 0) {
 		FREE_LIST(struct node_if, interfaces);
 		if (nqueues)
 			FREE_LIST(struct node_queue, nqueues);
 		return (0);
 	}
 
 	LOOP_THROUGH(struct node_if, interface, interfaces,
 		memcpy(&pa, a, sizeof(struct pf_altq));
 		if (strlcpy(pa.ifname, interface->ifname,
 		    sizeof(pa.ifname)) >= sizeof(pa.ifname))
 			errx(1, "expand_altq: strlcpy");
 
 		if (interface->not) {
 			yyerror("altq on ! <interface> is not supported");
 			errs++;
 		} else {
 			if (eval_pfaltq(pf, &pa, &bwspec, opts))
 				errs++;
 			else
 				if (pfctl_add_altq(pf, &pa))
 					errs++;
 
 			if (pf->opts & PF_OPT_VERBOSE) {
 				print_altq(&pf->paltq->altq, 0,
 				    &bwspec, opts);
 				if (nqueues && nqueues->tail) {
 					printf("queue { ");
 					LOOP_THROUGH(struct node_queue, queue,
 					    nqueues,
 						printf("%s ",
 						    queue->queue);
 					);
 					printf("}");
 				}
 				printf("\n");
 			}
 
 			if (pa.scheduler == ALTQT_CBQ ||
 			    pa.scheduler == ALTQT_HFSC) {
 				/* now create a root queue */
 				memset(&pb, 0, sizeof(struct pf_altq));
 				if (strlcpy(qname, "root_", sizeof(qname)) >=
 				    sizeof(qname))
 					errx(1, "expand_altq: strlcpy");
 				if (strlcat(qname, interface->ifname,
 				    sizeof(qname)) >= sizeof(qname))
 					errx(1, "expand_altq: strlcat");
 				if (strlcpy(pb.qname, qname,
 				    sizeof(pb.qname)) >= sizeof(pb.qname))
 					errx(1, "expand_altq: strlcpy");
 				if (strlcpy(pb.ifname, interface->ifname,
 				    sizeof(pb.ifname)) >= sizeof(pb.ifname))
 					errx(1, "expand_altq: strlcpy");
 				pb.qlimit = pa.qlimit;
 				pb.scheduler = pa.scheduler;
 				bw.bw_absolute = pa.ifbandwidth;
 				bw.bw_percent = 0;
 				if (eval_pfqueue(pf, &pb, &bw, opts))
 					errs++;
 				else
 					if (pfctl_add_altq(pf, &pb))
 						errs++;
 			}
 
 			LOOP_THROUGH(struct node_queue, queue, nqueues,
 				n = calloc(1, sizeof(struct node_queue));
 				if (n == NULL)
 					err(1, "expand_altq: calloc");
 				if (pa.scheduler == ALTQT_CBQ ||
 				    pa.scheduler == ALTQT_HFSC)
 					if (strlcpy(n->parent, qname,
 					    sizeof(n->parent)) >=
 					    sizeof(n->parent))
 						errx(1, "expand_altq: strlcpy");
 				if (strlcpy(n->queue, queue->queue,
 				    sizeof(n->queue)) >= sizeof(n->queue))
 					errx(1, "expand_altq: strlcpy");
 				if (strlcpy(n->ifname, interface->ifname,
 				    sizeof(n->ifname)) >= sizeof(n->ifname))
 					errx(1, "expand_altq: strlcpy");
 				n->scheduler = pa.scheduler;
 				n->next = NULL;
 				n->tail = n;
 				if (queues == NULL)
 					queues = n;
 				else {
 					queues->tail->next = n;
 					queues->tail = n;
 				}
 			);
 		}
 	);
 	FREE_LIST(struct node_if, interfaces);
 	if (nqueues)
 		FREE_LIST(struct node_queue, nqueues);
 
 	return (errs);
 }
 
 int
 expand_queue(struct pf_altq *a, struct node_if *interfaces,
     struct node_queue *nqueues, struct node_queue_bw bwspec,
     struct node_queue_opt *opts)
 {
 	struct node_queue	*n, *nq;
 	struct pf_altq		 pa;
 	u_int8_t		 found = 0;
 	u_int8_t		 errs = 0;
 
 	if ((pf->loadopt & PFCTL_FLAG_ALTQ) == 0) {
 		FREE_LIST(struct node_queue, nqueues);
 		return (0);
 	}
 
 	if (queues == NULL) {
 		yyerror("queue %s has no parent", a->qname);
 		FREE_LIST(struct node_queue, nqueues);
 		return (1);
 	}
 
 	LOOP_THROUGH(struct node_if, interface, interfaces,
 		LOOP_THROUGH(struct node_queue, tqueue, queues,
 			if (!strncmp(a->qname, tqueue->queue, PF_QNAME_SIZE) &&
 			    (interface->ifname[0] == 0 ||
 			    (!interface->not && !strncmp(interface->ifname,
 			    tqueue->ifname, IFNAMSIZ)) ||
 			    (interface->not && strncmp(interface->ifname,
 			    tqueue->ifname, IFNAMSIZ)))) {
 				/* found ourself in queues */
 				found++;
 
 				memcpy(&pa, a, sizeof(struct pf_altq));
 
 				if (pa.scheduler != ALTQT_NONE &&
 				    pa.scheduler != tqueue->scheduler) {
 					yyerror("exactly one scheduler type "
 					    "per interface allowed");
 					return (1);
 				}
 				pa.scheduler = tqueue->scheduler;
 
 				/* scheduler dependent error checking */
 				switch (pa.scheduler) {
 				case ALTQT_PRIQ:
 					if (nqueues != NULL) {
 						yyerror("priq queues cannot "
 						    "have child queues");
 						return (1);
 					}
 					if (bwspec.bw_absolute > 0 ||
 					    bwspec.bw_percent < 100) {
 						yyerror("priq doesn't take "
 						    "bandwidth");
 						return (1);
 					}
 					break;
 				default:
 					break;
 				}
 
 				if (strlcpy(pa.ifname, tqueue->ifname,
 				    sizeof(pa.ifname)) >= sizeof(pa.ifname))
 					errx(1, "expand_queue: strlcpy");
 				if (strlcpy(pa.parent, tqueue->parent,
 				    sizeof(pa.parent)) >= sizeof(pa.parent))
 					errx(1, "expand_queue: strlcpy");
 
 				if (eval_pfqueue(pf, &pa, &bwspec, opts))
 					errs++;
 				else
 					if (pfctl_add_altq(pf, &pa))
 						errs++;
 
 				for (nq = nqueues; nq != NULL; nq = nq->next) {
 					if (!strcmp(a->qname, nq->queue)) {
 						yyerror("queue cannot have "
 						    "itself as child");
 						errs++;
 						continue;
 					}
 					n = calloc(1,
 					    sizeof(struct node_queue));
 					if (n == NULL)
 						err(1, "expand_queue: calloc");
 					if (strlcpy(n->parent, a->qname,
 					    sizeof(n->parent)) >=
 					    sizeof(n->parent))
 						errx(1, "expand_queue strlcpy");
 					if (strlcpy(n->queue, nq->queue,
 					    sizeof(n->queue)) >=
 					    sizeof(n->queue))
 						errx(1, "expand_queue strlcpy");
 					if (strlcpy(n->ifname, tqueue->ifname,
 					    sizeof(n->ifname)) >=
 					    sizeof(n->ifname))
 						errx(1, "expand_queue strlcpy");
 					n->scheduler = tqueue->scheduler;
 					n->next = NULL;
 					n->tail = n;
 					if (queues == NULL)
 						queues = n;
 					else {
 						queues->tail->next = n;
 						queues->tail = n;
 					}
 				}
 				if ((pf->opts & PF_OPT_VERBOSE) && (
 				    (found == 1 && interface->ifname[0] == 0) ||
 				    (found > 0 && interface->ifname[0] != 0))) {
 					print_queue(&pf->paltq->altq, 0,
 					    &bwspec, interface->ifname[0] != 0,
 					    opts);
 					if (nqueues && nqueues->tail) {
 						printf("{ ");
 						LOOP_THROUGH(struct node_queue,
 						    queue, nqueues,
 							printf("%s ",
 							    queue->queue);
 						);
 						printf("}");
 					}
 					printf("\n");
 				}
 			}
 		);
 	);
 
 	FREE_LIST(struct node_queue, nqueues);
 	FREE_LIST(struct node_if, interfaces);
 
 	if (!found) {
 		yyerror("queue %s has no parent", a->qname);
 		errs++;
 	}
 
 	if (errs)
 		return (1);
 	else
 		return (0);
 }
 
 void
 expand_rule(struct pf_rule *r,
     struct node_if *interfaces, struct node_host *rpool_hosts,
     struct node_proto *protos, struct node_os *src_oses,
     struct node_host *src_hosts, struct node_port *src_ports,
     struct node_host *dst_hosts, struct node_port *dst_ports,
     struct node_uid *uids, struct node_gid *gids, struct node_icmp *icmp_types,
     const char *anchor_call)
 {
 	sa_family_t		 af = r->af;
 	int			 added = 0, error = 0;
 	char			 ifname[IF_NAMESIZE];
 	char			 label[PF_RULE_LABEL_SIZE];
 	char			 tagname[PF_TAG_NAME_SIZE];
 	char			 match_tagname[PF_TAG_NAME_SIZE];
 	struct pf_pooladdr	*pa;
 	struct node_host	*h;
 	u_int8_t		 flags, flagset, keep_state;
 
 	if (strlcpy(label, r->label, sizeof(label)) >= sizeof(label))
 		errx(1, "expand_rule: strlcpy");
 	if (strlcpy(tagname, r->tagname, sizeof(tagname)) >= sizeof(tagname))
 		errx(1, "expand_rule: strlcpy");
 	if (strlcpy(match_tagname, r->match_tagname, sizeof(match_tagname)) >=
 	    sizeof(match_tagname))
 		errx(1, "expand_rule: strlcpy");
 	flags = r->flags;
 	flagset = r->flagset;
 	keep_state = r->keep_state;
 
 	LOOP_THROUGH(struct node_if, interface, interfaces,
 	LOOP_THROUGH(struct node_proto, proto, protos,
 	LOOP_THROUGH(struct node_icmp, icmp_type, icmp_types,
 	LOOP_THROUGH(struct node_host, src_host, src_hosts,
 	LOOP_THROUGH(struct node_port, src_port, src_ports,
 	LOOP_THROUGH(struct node_os, src_os, src_oses,
 	LOOP_THROUGH(struct node_host, dst_host, dst_hosts,
 	LOOP_THROUGH(struct node_port, dst_port, dst_ports,
 	LOOP_THROUGH(struct node_uid, uid, uids,
 	LOOP_THROUGH(struct node_gid, gid, gids,
 
 		r->af = af;
 		/* for link-local IPv6 address, interface must match up */
 		if ((r->af && src_host->af && r->af != src_host->af) ||
 		    (r->af && dst_host->af && r->af != dst_host->af) ||
 		    (src_host->af && dst_host->af &&
 		    src_host->af != dst_host->af) ||
 		    (src_host->ifindex && dst_host->ifindex &&
 		    src_host->ifindex != dst_host->ifindex) ||
 		    (src_host->ifindex && *interface->ifname &&
 		    src_host->ifindex != if_nametoindex(interface->ifname)) ||
 		    (dst_host->ifindex && *interface->ifname &&
 		    dst_host->ifindex != if_nametoindex(interface->ifname)))
 			continue;
 		if (!r->af && src_host->af)
 			r->af = src_host->af;
 		else if (!r->af && dst_host->af)
 			r->af = dst_host->af;
 
 		if (*interface->ifname)
 			strlcpy(r->ifname, interface->ifname,
 			    sizeof(r->ifname));
 		else if (if_indextoname(src_host->ifindex, ifname))
 			strlcpy(r->ifname, ifname, sizeof(r->ifname));
 		else if (if_indextoname(dst_host->ifindex, ifname))
 			strlcpy(r->ifname, ifname, sizeof(r->ifname));
 		else
 			memset(r->ifname, '\0', sizeof(r->ifname));
 
 		if (strlcpy(r->label, label, sizeof(r->label)) >=
 		    sizeof(r->label))
 			errx(1, "expand_rule: strlcpy");
 		if (strlcpy(r->tagname, tagname, sizeof(r->tagname)) >=
 		    sizeof(r->tagname))
 			errx(1, "expand_rule: strlcpy");
 		if (strlcpy(r->match_tagname, match_tagname,
 		    sizeof(r->match_tagname)) >= sizeof(r->match_tagname))
 			errx(1, "expand_rule: strlcpy");
 		expand_label(r->label, PF_RULE_LABEL_SIZE, r->ifname, r->af,
 		    src_host, src_port, dst_host, dst_port, proto->proto);
 		expand_label(r->tagname, PF_TAG_NAME_SIZE, r->ifname, r->af,
 		    src_host, src_port, dst_host, dst_port, proto->proto);
 		expand_label(r->match_tagname, PF_TAG_NAME_SIZE, r->ifname,
 		    r->af, src_host, src_port, dst_host, dst_port,
 		    proto->proto);
 
 		error += check_netmask(src_host, r->af);
 		error += check_netmask(dst_host, r->af);
 
 		r->ifnot = interface->not;
 		r->proto = proto->proto;
 		r->src.addr = src_host->addr;
 		r->src.neg = src_host->not;
 		r->src.port[0] = src_port->port[0];
 		r->src.port[1] = src_port->port[1];
 		r->src.port_op = src_port->op;
 		r->dst.addr = dst_host->addr;
 		r->dst.neg = dst_host->not;
 		r->dst.port[0] = dst_port->port[0];
 		r->dst.port[1] = dst_port->port[1];
 		r->dst.port_op = dst_port->op;
 		r->uid.op = uid->op;
 		r->uid.uid[0] = uid->uid[0];
 		r->uid.uid[1] = uid->uid[1];
 		r->gid.op = gid->op;
 		r->gid.gid[0] = gid->gid[0];
 		r->gid.gid[1] = gid->gid[1];
 		r->type = icmp_type->type;
 		r->code = icmp_type->code;
 
 		if ((keep_state == PF_STATE_MODULATE ||
 		    keep_state == PF_STATE_SYNPROXY) &&
 		    r->proto && r->proto != IPPROTO_TCP)
 			r->keep_state = PF_STATE_NORMAL;
 		else
 			r->keep_state = keep_state;
 
 		if (r->proto && r->proto != IPPROTO_TCP) {
 			r->flags = 0;
 			r->flagset = 0;
 		} else {
 			r->flags = flags;
 			r->flagset = flagset;
 		}
 		if (icmp_type->proto && r->proto != icmp_type->proto) {
 			yyerror("icmp-type mismatch");
 			error++;
 		}
 
 		if (src_os && src_os->os) {
 			r->os_fingerprint = pfctl_get_fingerprint(src_os->os);
 			if ((pf->opts & PF_OPT_VERBOSE2) &&
 			    r->os_fingerprint == PF_OSFP_NOMATCH)
 				fprintf(stderr,
 				    "warning: unknown '%s' OS fingerprint\n",
 				    src_os->os);
 		} else {
 			r->os_fingerprint = PF_OSFP_ANY;
 		}
 
 		TAILQ_INIT(&r->rpool.list);
 		for (h = rpool_hosts; h != NULL; h = h->next) {
 			pa = calloc(1, sizeof(struct pf_pooladdr));
 			if (pa == NULL)
 				err(1, "expand_rule: calloc");
 			pa->addr = h->addr;
 			if (h->ifname != NULL) {
 				if (strlcpy(pa->ifname, h->ifname,
 				    sizeof(pa->ifname)) >=
 				    sizeof(pa->ifname))
 					errx(1, "expand_rule: strlcpy");
 			} else
 				pa->ifname[0] = 0;
 			TAILQ_INSERT_TAIL(&r->rpool.list, pa, entries);
 		}
 
 		if (rule_consistent(r, anchor_call[0]) < 0 || error)
 			yyerror("skipping rule due to errors");
 		else {
 			r->nr = pf->astack[pf->asd]->match++;
 			pfctl_add_rule(pf, r, anchor_call);
 			added++;
 		}
 
 	))))))))));
 
 	FREE_LIST(struct node_if, interfaces);
 	FREE_LIST(struct node_proto, protos);
 	FREE_LIST(struct node_host, src_hosts);
 	FREE_LIST(struct node_port, src_ports);
 	FREE_LIST(struct node_os, src_oses);
 	FREE_LIST(struct node_host, dst_hosts);
 	FREE_LIST(struct node_port, dst_ports);
 	FREE_LIST(struct node_uid, uids);
 	FREE_LIST(struct node_gid, gids);
 	FREE_LIST(struct node_icmp, icmp_types);
 	FREE_LIST(struct node_host, rpool_hosts);
 
 	if (!added)
 		yyerror("rule expands to no valid combination");
 }
 
 int
 expand_skip_interface(struct node_if *interfaces)
 {
 	int	errs = 0;
 
 	if (!interfaces || (!interfaces->next && !interfaces->not &&
 	    !strcmp(interfaces->ifname, "none"))) {
 		if (pf->opts & PF_OPT_VERBOSE)
 			printf("set skip on none\n");
 		errs = pfctl_set_interface_flags(pf, "", PFI_IFLAG_SKIP, 0);
 		return (errs);
 	}
 
 	if (pf->opts & PF_OPT_VERBOSE)
 		printf("set skip on {");
 	LOOP_THROUGH(struct node_if, interface, interfaces,
 		if (pf->opts & PF_OPT_VERBOSE)
 			printf(" %s", interface->ifname);
 		if (interface->not) {
 			yyerror("skip on ! <interface> is not supported");
 			errs++;
 		} else
 			errs += pfctl_set_interface_flags(pf,
 			    interface->ifname, PFI_IFLAG_SKIP, 1);
 	);
 	if (pf->opts & PF_OPT_VERBOSE)
 		printf(" }\n");
 
 	FREE_LIST(struct node_if, interfaces);
 
 	if (errs)
 		return (1);
 	else
 		return (0);
 }
 
 #undef FREE_LIST
 #undef LOOP_THROUGH
 
 int
 check_rulestate(int desired_state)
 {
 	if (require_order && (rulestate > desired_state)) {
 		yyerror("Rules must be in order: options, normalization, "
 		    "queueing, translation, filtering");
 		return (1);
 	}
 	rulestate = desired_state;
 	return (0);
 }
 
 int
 kw_cmp(const void *k, const void *e)
 {
 	return (strcmp(k, ((const struct keywords *)e)->k_name));
 }
 
 int
 lookup(char *s)
 {
 	/* this has to be sorted always */
 	static const struct keywords keywords[] = {
 		{ "all",		ALL},
 		{ "allow-opts",		ALLOWOPTS},
 		{ "altq",		ALTQ},
 		{ "anchor",		ANCHOR},
 		{ "antispoof",		ANTISPOOF},
 		{ "any",		ANY},
 		{ "bandwidth",		BANDWIDTH},
 		{ "binat",		BINAT},
 		{ "binat-anchor",	BINATANCHOR},
 		{ "bitmask",		BITMASK},
 		{ "block",		BLOCK},
 		{ "block-policy",	BLOCKPOLICY},
 		{ "buckets",		BUCKETS},
 		{ "cbq",		CBQ},
 		{ "code",		CODE},
 		{ "codelq",		CODEL},
 		{ "crop",		FRAGCROP},
 		{ "debug",		DEBUG},
 		{ "divert-reply",	DIVERTREPLY},
 		{ "divert-to",		DIVERTTO},
 		{ "drop",		DROP},
 		{ "drop-ovl",		FRAGDROP},
 		{ "dup-to",		DUPTO},
 		{ "fairq",		FAIRQ},
 		{ "fastroute",		FASTROUTE},
 		{ "file",		FILENAME},
 		{ "fingerprints",	FINGERPRINTS},
 		{ "flags",		FLAGS},
 		{ "floating",		FLOATING},
 		{ "flush",		FLUSH},
 		{ "for",		FOR},
 		{ "fragment",		FRAGMENT},
 		{ "from",		FROM},
 		{ "global",		GLOBAL},
 		{ "group",		GROUP},
 		{ "hfsc",		HFSC},
 		{ "hogs",		HOGS},
 		{ "hostid",		HOSTID},
 		{ "icmp-type",		ICMPTYPE},
 		{ "icmp6-type",		ICMP6TYPE},
 		{ "if-bound",		IFBOUND},
 		{ "in",			IN},
 		{ "include",		INCLUDE},
 		{ "inet",		INET},
 		{ "inet6",		INET6},
 		{ "interval",		INTERVAL},
 		{ "keep",		KEEP},
 		{ "label",		LABEL},
 		{ "limit",		LIMIT},
 		{ "linkshare",		LINKSHARE},
 		{ "load",		LOAD},
 		{ "log",		LOG},
 		{ "loginterface",	LOGINTERFACE},
 		{ "max",		MAXIMUM},
 		{ "max-mss",		MAXMSS},
 		{ "max-src-conn",	MAXSRCCONN},
 		{ "max-src-conn-rate",	MAXSRCCONNRATE},
 		{ "max-src-nodes",	MAXSRCNODES},
 		{ "max-src-states",	MAXSRCSTATES},
 		{ "min-ttl",		MINTTL},
 		{ "modulate",		MODULATE},
 		{ "nat",		NAT},
 		{ "nat-anchor",		NATANCHOR},
 		{ "no",			NO},
 		{ "no-df",		NODF},
 		{ "no-route",		NOROUTE},
 		{ "no-sync",		NOSYNC},
 		{ "on",			ON},
 		{ "optimization",	OPTIMIZATION},
 		{ "os",			OS},
 		{ "out",		OUT},
 		{ "overload",		OVERLOAD},
 		{ "pass",		PASS},
 		{ "port",		PORT},
 		{ "prio",		PRIO},
 		{ "priority",		PRIORITY},
 		{ "priq",		PRIQ},
 		{ "probability",	PROBABILITY},
 		{ "proto",		PROTO},
 		{ "qlimit",		QLIMIT},
 		{ "queue",		QUEUE},
 		{ "quick",		QUICK},
 		{ "random",		RANDOM},
 		{ "random-id",		RANDOMID},
 		{ "rdr",		RDR},
 		{ "rdr-anchor",		RDRANCHOR},
 		{ "realtime",		REALTIME},
 		{ "reassemble",		REASSEMBLE},
 		{ "reply-to",		REPLYTO},
 		{ "require-order",	REQUIREORDER},
 		{ "return",		RETURN},
 		{ "return-icmp",	RETURNICMP},
 		{ "return-icmp6",	RETURNICMP6},
 		{ "return-rst",		RETURNRST},
 		{ "round-robin",	ROUNDROBIN},
 		{ "route",		ROUTE},
 		{ "route-to",		ROUTETO},
 		{ "rtable",		RTABLE},
 		{ "rule",		RULE},
 		{ "ruleset-optimization",	RULESET_OPTIMIZATION},
 		{ "scrub",		SCRUB},
 		{ "set",		SET},
 		{ "set-tos",		SETTOS},
 		{ "skip",		SKIP},
 		{ "sloppy",		SLOPPY},
 		{ "source-hash",	SOURCEHASH},
 		{ "source-track",	SOURCETRACK},
 		{ "state",		STATE},
 		{ "state-defaults",	STATEDEFAULTS},
 		{ "state-policy",	STATEPOLICY},
 		{ "static-port",	STATICPORT},
 		{ "sticky-address",	STICKYADDRESS},
 		{ "synproxy",		SYNPROXY},
 		{ "table",		TABLE},
 		{ "tag",		TAG},
 		{ "tagged",		TAGGED},
 		{ "target",		TARGET},
 		{ "tbrsize",		TBRSIZE},
 		{ "timeout",		TIMEOUT},
 		{ "to",			TO},
 		{ "tos",		TOS},
 		{ "ttl",		TTL},
 		{ "upperlimit",		UPPERLIMIT},
 		{ "urpf-failed",	URPFFAILED},
 		{ "user",		USER},
 	};
 	const struct keywords	*p;
 
 	p = bsearch(s, keywords, sizeof(keywords)/sizeof(keywords[0]),
 	    sizeof(keywords[0]), kw_cmp);
 
 	if (p) {
 		if (debug > 1)
 			fprintf(stderr, "%s: %d\n", s, p->k_val);
 		return (p->k_val);
 	} else {
 		if (debug > 1)
 			fprintf(stderr, "string: %s\n", s);
 		return (STRING);
 	}
 }
 
 #define MAXPUSHBACK	128
 
-char	*parsebuf;
-int	 parseindex;
-char	 pushback_buffer[MAXPUSHBACK];
-int	 pushback_index = 0;
+static char	*parsebuf;
+static int	 parseindex;
+static char	 pushback_buffer[MAXPUSHBACK];
+static int	 pushback_index = 0;
 
 int
 lgetc(int quotec)
 {
 	int		c, next;
 
 	if (parsebuf) {
 		/* Read character from the parsebuffer instead of input. */
 		if (parseindex >= 0) {
 			c = parsebuf[parseindex++];
 			if (c != '\0')
 				return (c);
 			parsebuf = NULL;
 		} else
 			parseindex++;
 	}
 
 	if (pushback_index)
 		return (pushback_buffer[--pushback_index]);
 
 	if (quotec) {
 		if ((c = getc(file->stream)) == EOF) {
 			yyerror("reached end of file while parsing quoted string");
 			if (popfile() == EOF)
 				return (EOF);
 			return (quotec);
 		}
 		return (c);
 	}
 
 	while ((c = getc(file->stream)) == '\\') {
 		next = getc(file->stream);
 		if (next != '\n') {
 			c = next;
 			break;
 		}
 		yylval.lineno = file->lineno;
 		file->lineno++;
 	}
 
 	while (c == EOF) {
 		if (popfile() == EOF)
 			return (EOF);
 		c = getc(file->stream);
 	}
 	return (c);
 }
 
 int
 lungetc(int c)
 {
 	if (c == EOF)
 		return (EOF);
 	if (parsebuf) {
 		parseindex--;
 		if (parseindex >= 0)
 			return (c);
 	}
 	if (pushback_index < MAXPUSHBACK-1)
 		return (pushback_buffer[pushback_index++] = c);
 	else
 		return (EOF);
 }
 
 int
 findeol(void)
 {
 	int	c;
 
 	parsebuf = NULL;
 
 	/* skip to either EOF or the first real EOL */
 	while (1) {
 		if (pushback_index)
 			c = pushback_buffer[--pushback_index];
 		else
 			c = lgetc(0);
 		if (c == '\n') {
 			file->lineno++;
 			break;
 		}
 		if (c == EOF)
 			break;
 	}
 	return (ERROR);
 }
 
 int
 yylex(void)
 {
 	char	 buf[8096];
 	char	*p, *val;
 	int	 quotec, next, c;
 	int	 token;
 
 top:
 	p = buf;
 	while ((c = lgetc(0)) == ' ' || c == '\t')
 		; /* nothing */
 
 	yylval.lineno = file->lineno;
 	if (c == '#')
 		while ((c = lgetc(0)) != '\n' && c != EOF)
 			; /* nothing */
 	if (c == '$' && parsebuf == NULL) {
 		while (1) {
 			if ((c = lgetc(0)) == EOF)
 				return (0);
 
 			if (p + 1 >= buf + sizeof(buf) - 1) {
 				yyerror("string too long");
 				return (findeol());
 			}
 			if (isalnum(c) || c == '_') {
 				*p++ = (char)c;
 				continue;
 			}
 			*p = '\0';
 			lungetc(c);
 			break;
 		}
 		val = symget(buf);
 		if (val == NULL) {
 			yyerror("macro '%s' not defined", buf);
 			return (findeol());
 		}
 		parsebuf = val;
 		parseindex = 0;
 		goto top;
 	}
 
 	switch (c) {
 	case '\'':
 	case '"':
 		quotec = c;
 		while (1) {
 			if ((c = lgetc(quotec)) == EOF)
 				return (0);
 			if (c == '\n') {
 				file->lineno++;
 				continue;
 			} else if (c == '\\') {
 				if ((next = lgetc(quotec)) == EOF)
 					return (0);
 				if (next == quotec || c == ' ' || c == '\t')
 					c = next;
 				else if (next == '\n')
 					continue;
 				else
 					lungetc(next);
 			} else if (c == quotec) {
 				*p = '\0';
 				break;
 			}
 			if (p + 1 >= buf + sizeof(buf) - 1) {
 				yyerror("string too long");
 				return (findeol());
 			}
 			*p++ = (char)c;
 		}
 		yylval.v.string = strdup(buf);
 		if (yylval.v.string == NULL)
 			err(1, "yylex: strdup");
 		return (STRING);
 	case '<':
 		next = lgetc(0);
 		if (next == '>') {
 			yylval.v.i = PF_OP_XRG;
 			return (PORTBINARY);
 		}
 		lungetc(next);
 		break;
 	case '>':
 		next = lgetc(0);
 		if (next == '<') {
 			yylval.v.i = PF_OP_IRG;
 			return (PORTBINARY);
 		}
 		lungetc(next);
 		break;
 	case '-':
 		next = lgetc(0);
 		if (next == '>')
 			return (ARROW);
 		lungetc(next);
 		break;
 	}
 
 #define allowed_to_end_number(x) \
 	(isspace(x) || x == ')' || x ==',' || x == '/' || x == '}' || x == '=')
 
 	if (c == '-' || isdigit(c)) {
 		do {
 			*p++ = c;
 			if ((unsigned)(p-buf) >= sizeof(buf)) {
 				yyerror("string too long");
 				return (findeol());
 			}
 		} while ((c = lgetc(0)) != EOF && isdigit(c));
 		lungetc(c);
 		if (p == buf + 1 && buf[0] == '-')
 			goto nodigits;
 		if (c == EOF || allowed_to_end_number(c)) {
 			const char *errstr = NULL;
 
 			*p = '\0';
 			yylval.v.number = strtonum(buf, LLONG_MIN,
 			    LLONG_MAX, &errstr);
 			if (errstr) {
 				yyerror("\"%s\" invalid number: %s",
 				    buf, errstr);
 				return (findeol());
 			}
 			return (NUMBER);
 		} else {
 nodigits:
 			while (p > buf + 1)
 				lungetc(*--p);
 			c = *--p;
 			if (c == '-')
 				return (c);
 		}
 	}
 
 #define allowed_in_string(x) \
 	(isalnum(x) || (ispunct(x) && x != '(' && x != ')' && \
 	x != '{' && x != '}' && x != '<' && x != '>' && \
 	x != '!' && x != '=' && x != '/' && x != '#' && \
 	x != ','))
 
 	if (isalnum(c) || c == ':' || c == '_') {
 		do {
 			*p++ = c;
 			if ((unsigned)(p-buf) >= sizeof(buf)) {
 				yyerror("string too long");
 				return (findeol());
 			}
 		} while ((c = lgetc(0)) != EOF && (allowed_in_string(c)));
 		lungetc(c);
 		*p = '\0';
 		if ((token = lookup(buf)) == STRING)
 			if ((yylval.v.string = strdup(buf)) == NULL)
 				err(1, "yylex: strdup");
 		return (token);
 	}
 	if (c == '\n') {
 		yylval.lineno = file->lineno;
 		file->lineno++;
 	}
 	if (c == EOF)
 		return (0);
 	return (c);
 }
 
 int
 check_file_secrecy(int fd, const char *fname)
 {
 	struct stat	st;
 
 	if (fstat(fd, &st)) {
 		warn("cannot stat %s", fname);
 		return (-1);
 	}
 	if (st.st_uid != 0 && st.st_uid != getuid()) {
 		warnx("%s: owner not root or current user", fname);
 		return (-1);
 	}
 	if (st.st_mode & (S_IRWXG | S_IRWXO)) {
 		warnx("%s: group/world readable/writeable", fname);
 		return (-1);
 	}
 	return (0);
 }
 
 struct file *
 pushfile(const char *name, int secret)
 {
 	struct file	*nfile;
 
 	if ((nfile = calloc(1, sizeof(struct file))) == NULL ||
 	    (nfile->name = strdup(name)) == NULL) {
 		warn("malloc");
 		return (NULL);
 	}
 	if (TAILQ_FIRST(&files) == NULL && strcmp(nfile->name, "-") == 0) {
 		nfile->stream = stdin;
 		free(nfile->name);
 		if ((nfile->name = strdup("stdin")) == NULL) {
 			warn("strdup");
 			free(nfile);
 			return (NULL);
 		}
 	} else if ((nfile->stream = fopen(nfile->name, "r")) == NULL) {
 		warn("%s", nfile->name);
 		free(nfile->name);
 		free(nfile);
 		return (NULL);
 	} else if (secret &&
 	    check_file_secrecy(fileno(nfile->stream), nfile->name)) {
 		fclose(nfile->stream);
 		free(nfile->name);
 		free(nfile);
 		return (NULL);
 	}
 	nfile->lineno = 1;
 	TAILQ_INSERT_TAIL(&files, nfile, entry);
 	return (nfile);
 }
 
 int
 popfile(void)
 {
 	struct file	*prev;
 
 	if ((prev = TAILQ_PREV(file, files, entry)) != NULL) {
 		prev->errors += file->errors;
 		TAILQ_REMOVE(&files, file, entry);
 		fclose(file->stream);
 		free(file->name);
 		free(file);
 		file = prev;
 		return (0);
 	}
 	return (EOF);
 }
 
 int
 parse_config(char *filename, struct pfctl *xpf)
 {
 	int		 errors = 0;
 	struct sym	*sym;
 
 	pf = xpf;
 	errors = 0;
 	rulestate = PFCTL_STATE_NONE;
 	returnicmpdefault = (ICMP_UNREACH << 8) | ICMP_UNREACH_PORT;
 	returnicmp6default =
 	    (ICMP6_DST_UNREACH << 8) | ICMP6_DST_UNREACH_NOPORT;
 	blockpolicy = PFRULE_DROP;
 	require_order = 1;
 
 	if ((file = pushfile(filename, 0)) == NULL) {
 		warn("cannot open the main config file!");
 		return (-1);
 	}
 
 	yyparse();
 	errors = file->errors;
 	popfile();
 
 	/* Free macros and check which have not been used. */
 	while ((sym = TAILQ_FIRST(&symhead))) {
 		if ((pf->opts & PF_OPT_VERBOSE2) && !sym->used)
 			fprintf(stderr, "warning: macro '%s' not "
 			    "used\n", sym->nam);
 		free(sym->nam);
 		free(sym->val);
 		TAILQ_REMOVE(&symhead, sym, entry);
 		free(sym);
 	}
 
 	return (errors ? -1 : 0);
 }
 
 int
 symset(const char *nam, const char *val, int persist)
 {
 	struct sym	*sym;
 
 	for (sym = TAILQ_FIRST(&symhead); sym && strcmp(nam, sym->nam);
 	    sym = TAILQ_NEXT(sym, entry))
 		;	/* nothing */
 
 	if (sym != NULL) {
 		if (sym->persist == 1)
 			return (0);
 		else {
 			free(sym->nam);
 			free(sym->val);
 			TAILQ_REMOVE(&symhead, sym, entry);
 			free(sym);
 		}
 	}
 	if ((sym = calloc(1, sizeof(*sym))) == NULL)
 		return (-1);
 
 	sym->nam = strdup(nam);
 	if (sym->nam == NULL) {
 		free(sym);
 		return (-1);
 	}
 	sym->val = strdup(val);
 	if (sym->val == NULL) {
 		free(sym->nam);
 		free(sym);
 		return (-1);
 	}
 	sym->used = 0;
 	sym->persist = persist;
 	TAILQ_INSERT_TAIL(&symhead, sym, entry);
 	return (0);
 }
 
 int
 pfctl_cmdline_symset(char *s)
 {
 	char	*sym, *val;
 	int	 ret;
 
 	if ((val = strrchr(s, '=')) == NULL)
 		return (-1);
 
 	if ((sym = malloc(strlen(s) - strlen(val) + 1)) == NULL)
 		err(1, "pfctl_cmdline_symset: malloc");
 
 	strlcpy(sym, s, strlen(s) - strlen(val) + 1);
 
 	ret = symset(sym, val + 1, 1);
 	free(sym);
 
 	return (ret);
 }
 
 char *
 symget(const char *nam)
 {
 	struct sym	*sym;
 
 	TAILQ_FOREACH(sym, &symhead, entry)
 		if (strcmp(nam, sym->nam) == 0) {
 			sym->used = 1;
 			return (sym->val);
 		}
 	return (NULL);
 }
 
 void
 mv_rules(struct pf_ruleset *src, struct pf_ruleset *dst)
 {
 	int i;
 	struct pf_rule *r;
 
 	for (i = 0; i < PF_RULESET_MAX; ++i) {
 		while ((r = TAILQ_FIRST(src->rules[i].active.ptr))
 		    != NULL) {
 			TAILQ_REMOVE(src->rules[i].active.ptr, r, entries);
 			TAILQ_INSERT_TAIL(dst->rules[i].active.ptr, r, entries);
 			dst->anchor->match++;
 		}
 		src->anchor->match = 0;
 		while ((r = TAILQ_FIRST(src->rules[i].inactive.ptr))
 		    != NULL) {
 			TAILQ_REMOVE(src->rules[i].inactive.ptr, r, entries);
 			TAILQ_INSERT_TAIL(dst->rules[i].inactive.ptr,
 				r, entries);
 		}
 	}
 }
 
 void
 decide_address_family(struct node_host *n, sa_family_t *af)
 {
 	if (*af != 0 || n == NULL)
 		return;
 	*af = n->af;
 	while ((n = n->next) != NULL) {
 		if (n->af != *af) {
 			*af = 0;
 			return;
 		}
 	}
 }
 
 void
 remove_invalid_hosts(struct node_host **nh, sa_family_t *af)
 {
 	struct node_host	*n = *nh, *prev = NULL;
 
 	while (n != NULL) {
 		if (*af && n->af && n->af != *af) {
 			/* unlink and free n */
 			struct node_host *next = n->next;
 
 			/* adjust tail pointer */
 			if (n == (*nh)->tail)
 				(*nh)->tail = prev;
 			/* adjust previous node's next pointer */
 			if (prev == NULL)
 				*nh = next;
 			else
 				prev->next = next;
 			/* free node */
 			if (n->ifname != NULL)
 				free(n->ifname);
 			free(n);
 			n = next;
 		} else {
 			if (n->af && !*af)
 				*af = n->af;
 			prev = n;
 			n = n->next;
 		}
 	}
 }
 
 int
 invalid_redirect(struct node_host *nh, sa_family_t af)
 {
 	if (!af) {
 		struct node_host *n;
 
 		/* tables and dyniftl are ok without an address family */
 		for (n = nh; n != NULL; n = n->next) {
 			if (n->addr.type != PF_ADDR_TABLE &&
 			    n->addr.type != PF_ADDR_DYNIFTL) {
 				yyerror("address family not given and "
 				    "translation address expands to multiple "
 				    "address families");
 				return (1);
 			}
 		}
 	}
 	if (nh == NULL) {
 		yyerror("no translation address with matching address family "
 		    "found.");
 		return (1);
 	}
 	return (0);
 }
 
 int
 atoul(char *s, u_long *ulvalp)
 {
 	u_long	 ulval;
 	char	*ep;
 
 	errno = 0;
 	ulval = strtoul(s, &ep, 0);
 	if (s[0] == '\0' || *ep != '\0')
 		return (-1);
 	if (errno == ERANGE && ulval == ULONG_MAX)
 		return (-1);
 	*ulvalp = ulval;
 	return (0);
 }
 
 int
 getservice(char *n)
 {
 	struct servent	*s;
 	u_long		 ulval;
 
 	if (atoul(n, &ulval) == 0) {
 		if (ulval > 65535) {
 			yyerror("illegal port value %lu", ulval);
 			return (-1);
 		}
 		return (htons(ulval));
 	} else {
 		s = getservbyname(n, "tcp");
 		if (s == NULL)
 			s = getservbyname(n, "udp");
 		if (s == NULL) {
 			yyerror("unknown port %s", n);
 			return (-1);
 		}
 		return (s->s_port);
 	}
 }
 
 int
 rule_label(struct pf_rule *r, char *s)
 {
 	if (s) {
 		if (strlcpy(r->label, s, sizeof(r->label)) >=
 		    sizeof(r->label)) {
 			yyerror("rule label too long (max %d chars)",
 			    sizeof(r->label)-1);
 			return (-1);
 		}
 	}
 	return (0);
 }
 
 u_int16_t
 parseicmpspec(char *w, sa_family_t af)
 {
 	const struct icmpcodeent	*p;
 	u_long				 ulval;
 	u_int8_t			 icmptype;
 
 	if (af == AF_INET)
 		icmptype = returnicmpdefault >> 8;
 	else
 		icmptype = returnicmp6default >> 8;
 
 	if (atoul(w, &ulval) == -1) {
 		if ((p = geticmpcodebyname(icmptype, w, af)) == NULL) {
 			yyerror("unknown icmp code %s", w);
 			return (0);
 		}
 		ulval = p->code;
 	}
 	if (ulval > 255) {
 		yyerror("invalid icmp code %lu", ulval);
 		return (0);
 	}
 	return (icmptype << 8 | ulval);
 }
 
 int
 parseport(char *port, struct range *r, int extensions)
 {
 	char	*p = strchr(port, ':');
 
 	if (p == NULL) {
 		if ((r->a = getservice(port)) == -1)
 			return (-1);
 		r->b = 0;
 		r->t = PF_OP_NONE;
 		return (0);
 	}
 	if ((extensions & PPORT_STAR) && !strcmp(p+1, "*")) {
 		*p = 0;
 		if ((r->a = getservice(port)) == -1)
 			return (-1);
 		r->b = 0;
 		r->t = PF_OP_IRG;
 		return (0);
 	}
 	if ((extensions & PPORT_RANGE)) {
 		*p++ = 0;
 		if ((r->a = getservice(port)) == -1 ||
 		    (r->b = getservice(p)) == -1)
 			return (-1);
 		if (r->a == r->b) {
 			r->b = 0;
 			r->t = PF_OP_NONE;
 		} else
 			r->t = PF_OP_RRG;
 		return (0);
 	}
 	return (-1);
 }
 
 int
 pfctl_load_anchors(int dev, struct pfctl *pf, struct pfr_buffer *trans)
 {
 	struct loadanchors	*la;
 
 	TAILQ_FOREACH(la, &loadanchorshead, entries) {
 		if (pf->opts & PF_OPT_VERBOSE)
 			fprintf(stderr, "\nLoading anchor %s from %s\n",
 			    la->anchorname, la->filename);
 		if (pfctl_rules(dev, la->filename, pf->opts, pf->optimize,
 		    la->anchorname, trans) == -1)
 			return (-1);
 	}
 
 	return (0);
 }
 
 int
 rt_tableid_max(void)
 {
 #ifdef __FreeBSD__
 	int fibs;
 	size_t l = sizeof(fibs);
 
         if (sysctlbyname("net.fibs", &fibs, &l, NULL, 0) == -1)
 		fibs = 16;	/* XXX RT_MAXFIBS, at least limit it some. */
 	/*
 	 * As the OpenBSD code only compares > and not >= we need to adjust
 	 * here given we only accept values of 0..n and want to avoid #ifdefs
 	 * in the grammar.
 	 */
 	return (fibs - 1);
 #else
 	return (RT_TABLEID_MAX);
 #endif
 }
Index: user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl.c
===================================================================
--- user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl.c	(revision 303775)
@@ -1,2388 +1,2388 @@
 /*	$OpenBSD: pfctl.c,v 1.278 2008/08/31 20:18:17 jmc Exp $ */
 
 /*
  * Copyright (c) 2001 Daniel Hartmeier
  * Copyright (c) 2002,2003 Henning Brauer
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  *
  *    - Redistributions of source code must retain the above copyright
  *      notice, this list of conditions and the following disclaimer.
  *    - Redistributions in binary form must reproduce the above
  *      copyright notice, this list of conditions and the following
  *      disclaimer in the documentation and/or other materials provided
  *      with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
  * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
  * COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
  * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  *
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/types.h>
 #include <sys/ioctl.h>
 #include <sys/socket.h>
 #include <sys/stat.h>
 #include <sys/endian.h>
 
 #include <net/if.h>
 #include <netinet/in.h>
 #include <net/pfvar.h>
 #include <arpa/inet.h>
 #include <net/altq/altq.h>
 #include <sys/sysctl.h>
 
 #include <err.h>
 #include <errno.h>
 #include <fcntl.h>
 #include <limits.h>
 #include <netdb.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 
 #include "pfctl_parser.h"
 #include "pfctl.h"
 
 void	 usage(void);
 int	 pfctl_enable(int, int);
 int	 pfctl_disable(int, int);
 int	 pfctl_clear_stats(int, int);
 int	 pfctl_clear_interface_flags(int, int);
 int	 pfctl_clear_rules(int, int, char *);
 int	 pfctl_clear_nat(int, int, char *);
 int	 pfctl_clear_altq(int, int);
 int	 pfctl_clear_src_nodes(int, int);
 int	 pfctl_clear_states(int, const char *, int);
 void	 pfctl_addrprefix(char *, struct pf_addr *);
 int	 pfctl_kill_src_nodes(int, const char *, int);
 int	 pfctl_net_kill_states(int, const char *, int);
 int	 pfctl_label_kill_states(int, const char *, int);
 int	 pfctl_id_kill_states(int, const char *, int);
 void	 pfctl_init_options(struct pfctl *);
 int	 pfctl_load_options(struct pfctl *);
 int	 pfctl_load_limit(struct pfctl *, unsigned int, unsigned int);
 int	 pfctl_load_timeout(struct pfctl *, unsigned int, unsigned int);
 int	 pfctl_load_debug(struct pfctl *, unsigned int);
 int	 pfctl_load_logif(struct pfctl *, char *);
 int	 pfctl_load_hostid(struct pfctl *, u_int32_t);
 int	 pfctl_get_pool(int, struct pf_pool *, u_int32_t, u_int32_t, int,
 	    char *);
 void	 pfctl_print_rule_counters(struct pf_rule *, int);
 int	 pfctl_show_rules(int, char *, int, enum pfctl_show, char *, int);
 int	 pfctl_show_nat(int, int, char *);
 int	 pfctl_show_src_nodes(int, int);
 int	 pfctl_show_states(int, const char *, int);
 int	 pfctl_show_status(int, int);
 int	 pfctl_show_timeouts(int, int);
 int	 pfctl_show_limits(int, int);
 void	 pfctl_debug(int, u_int32_t, int);
 int	 pfctl_test_altqsupport(int, int);
 int	 pfctl_show_anchors(int, int, char *);
 int	 pfctl_ruleset_trans(struct pfctl *, char *, struct pf_anchor *);
 int	 pfctl_load_ruleset(struct pfctl *, char *,
 		struct pf_ruleset *, int, int);
 int	 pfctl_load_rule(struct pfctl *, char *, struct pf_rule *, int);
 const char	*pfctl_lookup_option(char *, const char * const *);
 
-struct pf_anchor_global	 pf_anchors;
-struct pf_anchor	 pf_main_anchor;
+static struct pf_anchor_global	 pf_anchors;
+static struct pf_anchor	 pf_main_anchor;
 
-const char	*clearopt;
-char		*rulesopt;
-const char	*showopt;
-const char	*debugopt;
-char		*anchoropt;
-const char	*optiopt = NULL;
-const char	*pf_device = "/dev/pf";
-char		*ifaceopt;
-char		*tableopt;
-const char	*tblcmdopt;
-int		 src_node_killers;
-char		*src_node_kill[2];
-int		 state_killers;
-char		*state_kill[2];
-int		 loadopt;
-int		 altqsupport;
+static const char	*clearopt;
+static char		*rulesopt;
+static const char	*showopt;
+static const char	*debugopt;
+static char		*anchoropt;
+static const char	*optiopt = NULL;
+static const char	*pf_device = "/dev/pf";
+static char		*ifaceopt;
+static char		*tableopt;
+static const char	*tblcmdopt;
+static int		 src_node_killers;
+static char		*src_node_kill[2];
+static int		 state_killers;
+static char		*state_kill[2];
+int			 loadopt;
+int			 altqsupport;
 
-int		 dev = -1;
-int		 first_title = 1;
-int		 labels = 0;
+int			 dev = -1;
+static int		 first_title = 1;
+static int		 labels = 0;
 
 #define INDENT(d, o)	do {						\
 				if (o) {				\
 					int i;				\
 					for (i=0; i < d; i++)		\
 						printf("  ");		\
 				}					\
 			} while (0);					\
 
 
 static const struct {
 	const char	*name;
 	int		index;
 } pf_limits[] = {
 	{ "states",		PF_LIMIT_STATES },
 	{ "src-nodes",		PF_LIMIT_SRC_NODES },
 	{ "frags",		PF_LIMIT_FRAGS },
 	{ "table-entries",	PF_LIMIT_TABLE_ENTRIES },
 	{ NULL,			0 }
 };
 
 struct pf_hint {
 	const char	*name;
 	int		timeout;
 };
 static const struct pf_hint pf_hint_normal[] = {
 	{ "tcp.first",		2 * 60 },
 	{ "tcp.opening",	30 },
 	{ "tcp.established",	24 * 60 * 60 },
 	{ "tcp.closing",	15 * 60 },
 	{ "tcp.finwait",	45 },
 	{ "tcp.closed",		90 },
 	{ "tcp.tsdiff",		30 },
 	{ NULL,			0 }
 };
 static const struct pf_hint pf_hint_satellite[] = {
 	{ "tcp.first",		3 * 60 },
 	{ "tcp.opening",	30 + 5 },
 	{ "tcp.established",	24 * 60 * 60 },
 	{ "tcp.closing",	15 * 60 + 5 },
 	{ "tcp.finwait",	45 + 5 },
 	{ "tcp.closed",		90 + 5 },
 	{ "tcp.tsdiff",		60 },
 	{ NULL,			0 }
 };
 static const struct pf_hint pf_hint_conservative[] = {
 	{ "tcp.first",		60 * 60 },
 	{ "tcp.opening",	15 * 60 },
 	{ "tcp.established",	5 * 24 * 60 * 60 },
 	{ "tcp.closing",	60 * 60 },
 	{ "tcp.finwait",	10 * 60 },
 	{ "tcp.closed",		3 * 60 },
 	{ "tcp.tsdiff",		60 },
 	{ NULL,			0 }
 };
 static const struct pf_hint pf_hint_aggressive[] = {
 	{ "tcp.first",		30 },
 	{ "tcp.opening",	5 },
 	{ "tcp.established",	5 * 60 * 60 },
 	{ "tcp.closing",	60 },
 	{ "tcp.finwait",	30 },
 	{ "tcp.closed",		30 },
 	{ "tcp.tsdiff",		10 },
 	{ NULL,			0 }
 };
 
 static const struct {
 	const char *name;
 	const struct pf_hint *hint;
 } pf_hints[] = {
 	{ "normal",		pf_hint_normal },
 	{ "satellite",		pf_hint_satellite },
 	{ "high-latency",	pf_hint_satellite },
 	{ "conservative",	pf_hint_conservative },
 	{ "aggressive",		pf_hint_aggressive },
 	{ NULL,			NULL }
 };
 
 static const char * const clearopt_list[] = {
 	"nat", "queue", "rules", "Sources",
 	"states", "info", "Tables", "osfp", "all", NULL
 };
 
 static const char * const showopt_list[] = {
 	"nat", "queue", "rules", "Anchors", "Sources", "states", "info",
 	"Interfaces", "labels", "timeouts", "memory", "Tables", "osfp",
 	"all", NULL
 };
 
 static const char * const tblcmdopt_list[] = {
 	"kill", "flush", "add", "delete", "load", "replace", "show",
 	"test", "zero", "expire", NULL
 };
 
 static const char * const debugopt_list[] = {
 	"none", "urgent", "misc", "loud", NULL
 };
 
 static const char * const optiopt_list[] = {
 	"none", "basic", "profile", NULL
 };
 
 void
 usage(void)
 {
 	extern char *__progname;
 
 	fprintf(stderr,
 "usage: %s [-AdeghmNnOPqRrvz] [-a anchor] [-D macro=value] [-F modifier]\n"
 	"\t[-f file] [-i interface] [-K host | network]\n"
 	"\t[-k host | network | label | id] [-o level] [-p device]\n"
 	"\t[-s modifier] [-t table -T command [address ...]] [-x level]\n",
 	    __progname);
 
 	exit(1);
 }
 
 int
 pfctl_enable(int dev, int opts)
 {
 	if (ioctl(dev, DIOCSTART)) {
 		if (errno == EEXIST)
 			errx(1, "pf already enabled");
 		else if (errno == ESRCH)
 			errx(1, "pfil registeration failed");
 		else
 			err(1, "DIOCSTART");
 	}
 	if ((opts & PF_OPT_QUIET) == 0)
 		fprintf(stderr, "pf enabled\n");
 
 	if (altqsupport && ioctl(dev, DIOCSTARTALTQ))
 		if (errno != EEXIST)
 			err(1, "DIOCSTARTALTQ");
 
 	return (0);
 }
 
 int
 pfctl_disable(int dev, int opts)
 {
 	if (ioctl(dev, DIOCSTOP)) {
 		if (errno == ENOENT)
 			errx(1, "pf not enabled");
 		else
 			err(1, "DIOCSTOP");
 	}
 	if ((opts & PF_OPT_QUIET) == 0)
 		fprintf(stderr, "pf disabled\n");
 
 	if (altqsupport && ioctl(dev, DIOCSTOPALTQ))
 			if (errno != ENOENT)
 				err(1, "DIOCSTOPALTQ");
 
 	return (0);
 }
 
 int
 pfctl_clear_stats(int dev, int opts)
 {
 	if (ioctl(dev, DIOCCLRSTATUS))
 		err(1, "DIOCCLRSTATUS");
 	if ((opts & PF_OPT_QUIET) == 0)
 		fprintf(stderr, "pf: statistics cleared\n");
 	return (0);
 }
 
 int
 pfctl_clear_interface_flags(int dev, int opts)
 {
 	struct pfioc_iface	pi;
 
 	if ((opts & PF_OPT_NOACTION) == 0) {
 		bzero(&pi, sizeof(pi));
 		pi.pfiio_flags = PFI_IFLAG_SKIP;
 
 		if (ioctl(dev, DIOCCLRIFFLAG, &pi))
 			err(1, "DIOCCLRIFFLAG");
 		if ((opts & PF_OPT_QUIET) == 0)
 			fprintf(stderr, "pf: interface flags reset\n");
 	}
 	return (0);
 }
 
 int
 pfctl_clear_rules(int dev, int opts, char *anchorname)
 {
 	struct pfr_buffer t;
 
 	memset(&t, 0, sizeof(t));
 	t.pfrb_type = PFRB_TRANS;
 	if (pfctl_add_trans(&t, PF_RULESET_SCRUB, anchorname) ||
 	    pfctl_add_trans(&t, PF_RULESET_FILTER, anchorname) ||
 	    pfctl_trans(dev, &t, DIOCXBEGIN, 0) ||
 	    pfctl_trans(dev, &t, DIOCXCOMMIT, 0))
 		err(1, "pfctl_clear_rules");
 	if ((opts & PF_OPT_QUIET) == 0)
 		fprintf(stderr, "rules cleared\n");
 	return (0);
 }
 
 int
 pfctl_clear_nat(int dev, int opts, char *anchorname)
 {
 	struct pfr_buffer t;
 
 	memset(&t, 0, sizeof(t));
 	t.pfrb_type = PFRB_TRANS;
 	if (pfctl_add_trans(&t, PF_RULESET_NAT, anchorname) ||
 	    pfctl_add_trans(&t, PF_RULESET_BINAT, anchorname) ||
 	    pfctl_add_trans(&t, PF_RULESET_RDR, anchorname) ||
 	    pfctl_trans(dev, &t, DIOCXBEGIN, 0) ||
 	    pfctl_trans(dev, &t, DIOCXCOMMIT, 0))
 		err(1, "pfctl_clear_nat");
 	if ((opts & PF_OPT_QUIET) == 0)
 		fprintf(stderr, "nat cleared\n");
 	return (0);
 }
 
 int
 pfctl_clear_altq(int dev, int opts)
 {
 	struct pfr_buffer t;
 
 	if (!altqsupport)
 		return (-1);
 	memset(&t, 0, sizeof(t));
 	t.pfrb_type = PFRB_TRANS;
 	if (pfctl_add_trans(&t, PF_RULESET_ALTQ, "") ||
 	    pfctl_trans(dev, &t, DIOCXBEGIN, 0) ||
 	    pfctl_trans(dev, &t, DIOCXCOMMIT, 0))
 		err(1, "pfctl_clear_altq");
 	if ((opts & PF_OPT_QUIET) == 0)
 		fprintf(stderr, "altq cleared\n");
 	return (0);
 }
 
 int
 pfctl_clear_src_nodes(int dev, int opts)
 {
 	if (ioctl(dev, DIOCCLRSRCNODES))
 		err(1, "DIOCCLRSRCNODES");
 	if ((opts & PF_OPT_QUIET) == 0)
 		fprintf(stderr, "source tracking entries cleared\n");
 	return (0);
 }
 
 int
 pfctl_clear_states(int dev, const char *iface, int opts)
 {
 	struct pfioc_state_kill psk;
 
 	memset(&psk, 0, sizeof(psk));
 	if (iface != NULL && strlcpy(psk.psk_ifname, iface,
 	    sizeof(psk.psk_ifname)) >= sizeof(psk.psk_ifname))
 		errx(1, "invalid interface: %s", iface);
 
 	if (ioctl(dev, DIOCCLRSTATES, &psk))
 		err(1, "DIOCCLRSTATES");
 	if ((opts & PF_OPT_QUIET) == 0)
 		fprintf(stderr, "%d states cleared\n", psk.psk_killed);
 	return (0);
 }
 
 void
 pfctl_addrprefix(char *addr, struct pf_addr *mask)
 {
 	char *p;
 	const char *errstr;
 	int prefix, ret_ga, q, r;
 	struct addrinfo hints, *res;
 
 	if ((p = strchr(addr, '/')) == NULL)
 		return;
 
 	*p++ = '\0';
 	prefix = strtonum(p, 0, 128, &errstr);
 	if (errstr)
 		errx(1, "prefix is %s: %s", errstr, p);
 
 	bzero(&hints, sizeof(hints));
 	/* prefix only with numeric addresses */
 	hints.ai_flags |= AI_NUMERICHOST;
 
 	if ((ret_ga = getaddrinfo(addr, NULL, &hints, &res))) {
 		errx(1, "getaddrinfo: %s", gai_strerror(ret_ga));
 		/* NOTREACHED */
 	}
 
 	if (res->ai_family == AF_INET && prefix > 32)
 		errx(1, "prefix too long for AF_INET");
 	else if (res->ai_family == AF_INET6 && prefix > 128)
 		errx(1, "prefix too long for AF_INET6");
 
 	q = prefix >> 3;
 	r = prefix & 7;
 	switch (res->ai_family) {
 	case AF_INET:
 		bzero(&mask->v4, sizeof(mask->v4));
 		mask->v4.s_addr = htonl((u_int32_t)
 		    (0xffffffffffULL << (32 - prefix)));
 		break;
 	case AF_INET6:
 		bzero(&mask->v6, sizeof(mask->v6));
 		if (q > 0)
 			memset((void *)&mask->v6, 0xff, q);
 		if (r > 0)
 			*((u_char *)&mask->v6 + q) =
 			    (0xff00 >> r) & 0xff;
 		break;
 	}
 	freeaddrinfo(res);
 }
 
 int
 pfctl_kill_src_nodes(int dev, const char *iface, int opts)
 {
 	struct pfioc_src_node_kill psnk;
 	struct addrinfo *res[2], *resp[2];
 	struct sockaddr last_src, last_dst;
 	int killed, sources, dests;
 	int ret_ga;
 
 	killed = sources = dests = 0;
 
 	memset(&psnk, 0, sizeof(psnk));
 	memset(&psnk.psnk_src.addr.v.a.mask, 0xff,
 	    sizeof(psnk.psnk_src.addr.v.a.mask));
 	memset(&last_src, 0xff, sizeof(last_src));
 	memset(&last_dst, 0xff, sizeof(last_dst));
 
 	pfctl_addrprefix(src_node_kill[0], &psnk.psnk_src.addr.v.a.mask);
 
 	if ((ret_ga = getaddrinfo(src_node_kill[0], NULL, NULL, &res[0]))) {
 		errx(1, "getaddrinfo: %s", gai_strerror(ret_ga));
 		/* NOTREACHED */
 	}
 	for (resp[0] = res[0]; resp[0]; resp[0] = resp[0]->ai_next) {
 		if (resp[0]->ai_addr == NULL)
 			continue;
 		/* We get lots of duplicates.  Catch the easy ones */
 		if (memcmp(&last_src, resp[0]->ai_addr, sizeof(last_src)) == 0)
 			continue;
 		last_src = *(struct sockaddr *)resp[0]->ai_addr;
 
 		psnk.psnk_af = resp[0]->ai_family;
 		sources++;
 
 		if (psnk.psnk_af == AF_INET)
 			psnk.psnk_src.addr.v.a.addr.v4 =
 			    ((struct sockaddr_in *)resp[0]->ai_addr)->sin_addr;
 		else if (psnk.psnk_af == AF_INET6)
 			psnk.psnk_src.addr.v.a.addr.v6 =
 			    ((struct sockaddr_in6 *)resp[0]->ai_addr)->
 			    sin6_addr;
 		else
 			errx(1, "Unknown address family %d", psnk.psnk_af);
 
 		if (src_node_killers > 1) {
 			dests = 0;
 			memset(&psnk.psnk_dst.addr.v.a.mask, 0xff,
 			    sizeof(psnk.psnk_dst.addr.v.a.mask));
 			memset(&last_dst, 0xff, sizeof(last_dst));
 			pfctl_addrprefix(src_node_kill[1],
 			    &psnk.psnk_dst.addr.v.a.mask);
 			if ((ret_ga = getaddrinfo(src_node_kill[1], NULL, NULL,
 			    &res[1]))) {
 				errx(1, "getaddrinfo: %s",
 				    gai_strerror(ret_ga));
 				/* NOTREACHED */
 			}
 			for (resp[1] = res[1]; resp[1];
 			    resp[1] = resp[1]->ai_next) {
 				if (resp[1]->ai_addr == NULL)
 					continue;
 				if (psnk.psnk_af != resp[1]->ai_family)
 					continue;
 
 				if (memcmp(&last_dst, resp[1]->ai_addr,
 				    sizeof(last_dst)) == 0)
 					continue;
 				last_dst = *(struct sockaddr *)resp[1]->ai_addr;
 
 				dests++;
 
 				if (psnk.psnk_af == AF_INET)
 					psnk.psnk_dst.addr.v.a.addr.v4 =
 					    ((struct sockaddr_in *)resp[1]->
 					    ai_addr)->sin_addr;
 				else if (psnk.psnk_af == AF_INET6)
 					psnk.psnk_dst.addr.v.a.addr.v6 =
 					    ((struct sockaddr_in6 *)resp[1]->
 					    ai_addr)->sin6_addr;
 				else
 					errx(1, "Unknown address family %d",
 					    psnk.psnk_af);
 
 				if (ioctl(dev, DIOCKILLSRCNODES, &psnk))
 					err(1, "DIOCKILLSRCNODES");
 				killed += psnk.psnk_killed;
 			}
 			freeaddrinfo(res[1]);
 		} else {
 			if (ioctl(dev, DIOCKILLSRCNODES, &psnk))
 				err(1, "DIOCKILLSRCNODES");
 			killed += psnk.psnk_killed;
 		}
 	}
 
 	freeaddrinfo(res[0]);
 
 	if ((opts & PF_OPT_QUIET) == 0)
 		fprintf(stderr, "killed %d src nodes from %d sources and %d "
 		    "destinations\n", killed, sources, dests);
 	return (0);
 }
 
 int
 pfctl_net_kill_states(int dev, const char *iface, int opts)
 {
 	struct pfioc_state_kill psk;
 	struct addrinfo *res[2], *resp[2];
 	struct sockaddr last_src, last_dst;
 	int killed, sources, dests;
 	int ret_ga;
 
 	killed = sources = dests = 0;
 
 	memset(&psk, 0, sizeof(psk));
 	memset(&psk.psk_src.addr.v.a.mask, 0xff,
 	    sizeof(psk.psk_src.addr.v.a.mask));
 	memset(&last_src, 0xff, sizeof(last_src));
 	memset(&last_dst, 0xff, sizeof(last_dst));
 	if (iface != NULL && strlcpy(psk.psk_ifname, iface,
 	    sizeof(psk.psk_ifname)) >= sizeof(psk.psk_ifname))
 		errx(1, "invalid interface: %s", iface);
 
 	pfctl_addrprefix(state_kill[0], &psk.psk_src.addr.v.a.mask);
 
 	if ((ret_ga = getaddrinfo(state_kill[0], NULL, NULL, &res[0]))) {
 		errx(1, "getaddrinfo: %s", gai_strerror(ret_ga));
 		/* NOTREACHED */
 	}
 	for (resp[0] = res[0]; resp[0]; resp[0] = resp[0]->ai_next) {
 		if (resp[0]->ai_addr == NULL)
 			continue;
 		/* We get lots of duplicates.  Catch the easy ones */
 		if (memcmp(&last_src, resp[0]->ai_addr, sizeof(last_src)) == 0)
 			continue;
 		last_src = *(struct sockaddr *)resp[0]->ai_addr;
 
 		psk.psk_af = resp[0]->ai_family;
 		sources++;
 
 		if (psk.psk_af == AF_INET)
 			psk.psk_src.addr.v.a.addr.v4 =
 			    ((struct sockaddr_in *)resp[0]->ai_addr)->sin_addr;
 		else if (psk.psk_af == AF_INET6)
 			psk.psk_src.addr.v.a.addr.v6 =
 			    ((struct sockaddr_in6 *)resp[0]->ai_addr)->
 			    sin6_addr;
 		else
 			errx(1, "Unknown address family %d", psk.psk_af);
 
 		if (state_killers > 1) {
 			dests = 0;
 			memset(&psk.psk_dst.addr.v.a.mask, 0xff,
 			    sizeof(psk.psk_dst.addr.v.a.mask));
 			memset(&last_dst, 0xff, sizeof(last_dst));
 			pfctl_addrprefix(state_kill[1],
 			    &psk.psk_dst.addr.v.a.mask);
 			if ((ret_ga = getaddrinfo(state_kill[1], NULL, NULL,
 			    &res[1]))) {
 				errx(1, "getaddrinfo: %s",
 				    gai_strerror(ret_ga));
 				/* NOTREACHED */
 			}
 			for (resp[1] = res[1]; resp[1];
 			    resp[1] = resp[1]->ai_next) {
 				if (resp[1]->ai_addr == NULL)
 					continue;
 				if (psk.psk_af != resp[1]->ai_family)
 					continue;
 
 				if (memcmp(&last_dst, resp[1]->ai_addr,
 				    sizeof(last_dst)) == 0)
 					continue;
 				last_dst = *(struct sockaddr *)resp[1]->ai_addr;
 
 				dests++;
 
 				if (psk.psk_af == AF_INET)
 					psk.psk_dst.addr.v.a.addr.v4 =
 					    ((struct sockaddr_in *)resp[1]->
 					    ai_addr)->sin_addr;
 				else if (psk.psk_af == AF_INET6)
 					psk.psk_dst.addr.v.a.addr.v6 =
 					    ((struct sockaddr_in6 *)resp[1]->
 					    ai_addr)->sin6_addr;
 				else
 					errx(1, "Unknown address family %d",
 					    psk.psk_af);
 
 				if (ioctl(dev, DIOCKILLSTATES, &psk))
 					err(1, "DIOCKILLSTATES");
 				killed += psk.psk_killed;
 			}
 			freeaddrinfo(res[1]);
 		} else {
 			if (ioctl(dev, DIOCKILLSTATES, &psk))
 				err(1, "DIOCKILLSTATES");
 			killed += psk.psk_killed;
 		}
 	}
 
 	freeaddrinfo(res[0]);
 
 	if ((opts & PF_OPT_QUIET) == 0)
 		fprintf(stderr, "killed %d states from %d sources and %d "
 		    "destinations\n", killed, sources, dests);
 	return (0);
 }
 
 int
 pfctl_label_kill_states(int dev, const char *iface, int opts)
 {
 	struct pfioc_state_kill psk;
 
 	if (state_killers != 2 || (strlen(state_kill[1]) == 0)) {
 		warnx("no label specified");
 		usage();
 	}
 	memset(&psk, 0, sizeof(psk));
 	if (iface != NULL && strlcpy(psk.psk_ifname, iface,
 	    sizeof(psk.psk_ifname)) >= sizeof(psk.psk_ifname))
 		errx(1, "invalid interface: %s", iface);
 
 	if (strlcpy(psk.psk_label, state_kill[1], sizeof(psk.psk_label)) >=
 	    sizeof(psk.psk_label))
 		errx(1, "label too long: %s", state_kill[1]);
 
 	if (ioctl(dev, DIOCKILLSTATES, &psk))
 		err(1, "DIOCKILLSTATES");
 
 	if ((opts & PF_OPT_QUIET) == 0)
 		fprintf(stderr, "killed %d states\n", psk.psk_killed);
 
 	return (0);
 }
 
 int
 pfctl_id_kill_states(int dev, const char *iface, int opts)
 {
 	struct pfioc_state_kill psk;
 	
 	if (state_killers != 2 || (strlen(state_kill[1]) == 0)) {
 		warnx("no id specified");
 		usage();
 	}
 
 	memset(&psk, 0, sizeof(psk));
 	if ((sscanf(state_kill[1], "%jx/%x",
 	    &psk.psk_pfcmp.id, &psk.psk_pfcmp.creatorid)) == 2)
 		HTONL(psk.psk_pfcmp.creatorid);
 	else if ((sscanf(state_kill[1], "%jx", &psk.psk_pfcmp.id)) == 1) {
 		psk.psk_pfcmp.creatorid = 0;
 	} else {
 		warnx("wrong id format specified");
 		usage();
 	}
 	if (psk.psk_pfcmp.id == 0) {
 		warnx("cannot kill id 0");
 		usage();
 	}
 
 	psk.psk_pfcmp.id = htobe64(psk.psk_pfcmp.id);
 	if (ioctl(dev, DIOCKILLSTATES, &psk))
 		err(1, "DIOCKILLSTATES");
 
 	if ((opts & PF_OPT_QUIET) == 0)
 		fprintf(stderr, "killed %d states\n", psk.psk_killed);
 
 	return (0);
 }
 
 int
 pfctl_get_pool(int dev, struct pf_pool *pool, u_int32_t nr,
     u_int32_t ticket, int r_action, char *anchorname)
 {
 	struct pfioc_pooladdr pp;
 	struct pf_pooladdr *pa;
 	u_int32_t pnr, mpnr;
 
 	memset(&pp, 0, sizeof(pp));
 	memcpy(pp.anchor, anchorname, sizeof(pp.anchor));
 	pp.r_action = r_action;
 	pp.r_num = nr;
 	pp.ticket = ticket;
 	if (ioctl(dev, DIOCGETADDRS, &pp)) {
 		warn("DIOCGETADDRS");
 		return (-1);
 	}
 	mpnr = pp.nr;
 	TAILQ_INIT(&pool->list);
 	for (pnr = 0; pnr < mpnr; ++pnr) {
 		pp.nr = pnr;
 		if (ioctl(dev, DIOCGETADDR, &pp)) {
 			warn("DIOCGETADDR");
 			return (-1);
 		}
 		pa = calloc(1, sizeof(struct pf_pooladdr));
 		if (pa == NULL)
 			err(1, "calloc");
 		bcopy(&pp.addr, pa, sizeof(struct pf_pooladdr));
 		TAILQ_INSERT_TAIL(&pool->list, pa, entries);
 	}
 
 	return (0);
 }
 
 void
 pfctl_move_pool(struct pf_pool *src, struct pf_pool *dst)
 {
 	struct pf_pooladdr *pa;
 
 	while ((pa = TAILQ_FIRST(&src->list)) != NULL) {
 		TAILQ_REMOVE(&src->list, pa, entries);
 		TAILQ_INSERT_TAIL(&dst->list, pa, entries);
 	}
 }
 
 void
 pfctl_clear_pool(struct pf_pool *pool)
 {
 	struct pf_pooladdr *pa;
 
 	while ((pa = TAILQ_FIRST(&pool->list)) != NULL) {
 		TAILQ_REMOVE(&pool->list, pa, entries);
 		free(pa);
 	}
 }
 
 void
 pfctl_print_rule_counters(struct pf_rule *rule, int opts)
 {
 	if (opts & PF_OPT_DEBUG) {
 		const char *t[PF_SKIP_COUNT] = { "i", "d", "f",
 		    "p", "sa", "sp", "da", "dp" };
 		int i;
 
 		printf("  [ Skip steps: ");
 		for (i = 0; i < PF_SKIP_COUNT; ++i) {
 			if (rule->skip[i].nr == rule->nr + 1)
 				continue;
 			printf("%s=", t[i]);
 			if (rule->skip[i].nr == -1)
 				printf("end ");
 			else
 				printf("%u ", rule->skip[i].nr);
 		}
 		printf("]\n");
 
 		printf("  [ queue: qname=%s qid=%u pqname=%s pqid=%u ]\n",
 		    rule->qname, rule->qid, rule->pqname, rule->pqid);
 	}
 	if (opts & PF_OPT_VERBOSE) {
 		printf("  [ Evaluations: %-8llu  Packets: %-8llu  "
 			    "Bytes: %-10llu  States: %-6ju]\n",
 			    (unsigned long long)rule->evaluations,
 			    (unsigned long long)(rule->packets[0] +
 			    rule->packets[1]),
 			    (unsigned long long)(rule->bytes[0] +
 			    rule->bytes[1]), (uintmax_t)rule->u_states_cur);
 		if (!(opts & PF_OPT_DEBUG))
 			printf("  [ Inserted: uid %u pid %u "
 			    "State Creations: %-6ju]\n",
 			    (unsigned)rule->cuid, (unsigned)rule->cpid,
 			    (uintmax_t)rule->u_states_tot);
 	}
 }
 
 void
 pfctl_print_title(char *title)
 {
 	if (!first_title)
 		printf("\n");
 	first_title = 0;
 	printf("%s\n", title);
 }
 
 int
 pfctl_show_rules(int dev, char *path, int opts, enum pfctl_show format,
     char *anchorname, int depth)
 {
 	struct pfioc_rule pr;
 	u_int32_t nr, mnr, header = 0;
 	int rule_numbers = opts & (PF_OPT_VERBOSE2 | PF_OPT_DEBUG);
 	int numeric = opts & PF_OPT_NUMERIC;
 	int len = strlen(path);
 	int brace;
 	char *p;
 
 	if (path[0])
 		snprintf(&path[len], MAXPATHLEN - len, "/%s", anchorname);
 	else
 		snprintf(&path[len], MAXPATHLEN - len, "%s", anchorname);
 
 	memset(&pr, 0, sizeof(pr));
 	memcpy(pr.anchor, path, sizeof(pr.anchor));
 	if (opts & PF_OPT_SHOWALL) {
 		pr.rule.action = PF_PASS;
 		if (ioctl(dev, DIOCGETRULES, &pr)) {
 			warn("DIOCGETRULES");
 			goto error;
 		}
 		header++;
 	}
 	pr.rule.action = PF_SCRUB;
 	if (ioctl(dev, DIOCGETRULES, &pr)) {
 		warn("DIOCGETRULES");
 		goto error;
 	}
 	if (opts & PF_OPT_SHOWALL) {
 		if (format == PFCTL_SHOW_RULES && (pr.nr > 0 || header))
 			pfctl_print_title("FILTER RULES:");
 		else if (format == PFCTL_SHOW_LABELS && labels)
 			pfctl_print_title("LABEL COUNTERS:");
 	}
 	mnr = pr.nr;
 	if (opts & PF_OPT_CLRRULECTRS)
 		pr.action = PF_GET_CLR_CNTR;
 
 	for (nr = 0; nr < mnr; ++nr) {
 		pr.nr = nr;
 		if (ioctl(dev, DIOCGETRULE, &pr)) {
 			warn("DIOCGETRULE");
 			goto error;
 		}
 
 		if (pfctl_get_pool(dev, &pr.rule.rpool,
 		    nr, pr.ticket, PF_SCRUB, path) != 0)
 			goto error;
 
 		switch (format) {
 		case PFCTL_SHOW_LABELS:
 			break;
 		case PFCTL_SHOW_RULES:
 			if (pr.rule.label[0] && (opts & PF_OPT_SHOWALL))
 				labels = 1;
 			print_rule(&pr.rule, pr.anchor_call, rule_numbers, numeric);
 			printf("\n");
 			pfctl_print_rule_counters(&pr.rule, opts);
 			break;
 		case PFCTL_SHOW_NOTHING:
 			break;
 		}
 		pfctl_clear_pool(&pr.rule.rpool);
 	}
 	pr.rule.action = PF_PASS;
 	if (ioctl(dev, DIOCGETRULES, &pr)) {
 		warn("DIOCGETRULES");
 		goto error;
 	}
 	mnr = pr.nr;
 	for (nr = 0; nr < mnr; ++nr) {
 		pr.nr = nr;
 		if (ioctl(dev, DIOCGETRULE, &pr)) {
 			warn("DIOCGETRULE");
 			goto error;
 		}
 
 		if (pfctl_get_pool(dev, &pr.rule.rpool,
 		    nr, pr.ticket, PF_PASS, path) != 0)
 			goto error;
 
 		switch (format) {
 		case PFCTL_SHOW_LABELS:
 			if (pr.rule.label[0]) {
 				printf("%s %llu %llu %llu %llu"
 				    " %llu %llu %llu %ju\n",
 				    pr.rule.label,
 				    (unsigned long long)pr.rule.evaluations,
 				    (unsigned long long)(pr.rule.packets[0] +
 				    pr.rule.packets[1]),
 				    (unsigned long long)(pr.rule.bytes[0] +
 				    pr.rule.bytes[1]),
 				    (unsigned long long)pr.rule.packets[0],
 				    (unsigned long long)pr.rule.bytes[0],
 				    (unsigned long long)pr.rule.packets[1],
 				    (unsigned long long)pr.rule.bytes[1],
 				    (uintmax_t)pr.rule.u_states_tot);
 			}
 			break;
 		case PFCTL_SHOW_RULES:
 			brace = 0;
 			if (pr.rule.label[0] && (opts & PF_OPT_SHOWALL))
 				labels = 1;
 			INDENT(depth, !(opts & PF_OPT_VERBOSE));
 			if (pr.anchor_call[0] &&
 			   ((((p = strrchr(pr.anchor_call, '_')) != NULL) &&
 			   ((void *)p == (void *)pr.anchor_call ||
 			   *(--p) == '/')) || (opts & PF_OPT_RECURSE))) {
 				brace++;
 				if ((p = strrchr(pr.anchor_call, '/')) !=
 				    NULL)
 					p++;
 				else
 					p = &pr.anchor_call[0];
 			} else
 				p = &pr.anchor_call[0];
 		
 			print_rule(&pr.rule, p, rule_numbers, numeric);
 			if (brace)
 				printf(" {\n");
 			else
 				printf("\n");
 			pfctl_print_rule_counters(&pr.rule, opts);
 			if (brace) { 
 				pfctl_show_rules(dev, path, opts, format,
 				    p, depth + 1);
 				INDENT(depth, !(opts & PF_OPT_VERBOSE));
 				printf("}\n");
 			}
 			break;
 		case PFCTL_SHOW_NOTHING:
 			break;
 		}
 		pfctl_clear_pool(&pr.rule.rpool);
 	}
 	path[len] = '\0';
 	return (0);
 
  error:
 	path[len] = '\0';
 	return (-1);
 }
 
 int
 pfctl_show_nat(int dev, int opts, char *anchorname)
 {
 	struct pfioc_rule pr;
 	u_int32_t mnr, nr;
 	static int nattype[3] = { PF_NAT, PF_RDR, PF_BINAT };
 	int i, dotitle = opts & PF_OPT_SHOWALL;
 
 	memset(&pr, 0, sizeof(pr));
 	memcpy(pr.anchor, anchorname, sizeof(pr.anchor));
 	for (i = 0; i < 3; i++) {
 		pr.rule.action = nattype[i];
 		if (ioctl(dev, DIOCGETRULES, &pr)) {
 			warn("DIOCGETRULES");
 			return (-1);
 		}
 		mnr = pr.nr;
 		for (nr = 0; nr < mnr; ++nr) {
 			pr.nr = nr;
 			if (ioctl(dev, DIOCGETRULE, &pr)) {
 				warn("DIOCGETRULE");
 				return (-1);
 			}
 			if (pfctl_get_pool(dev, &pr.rule.rpool, nr,
 			    pr.ticket, nattype[i], anchorname) != 0)
 				return (-1);
 			if (dotitle) {
 				pfctl_print_title("TRANSLATION RULES:");
 				dotitle = 0;
 			}
 			print_rule(&pr.rule, pr.anchor_call,
 			    opts & PF_OPT_VERBOSE2, opts & PF_OPT_NUMERIC);
 			printf("\n");
 			pfctl_print_rule_counters(&pr.rule, opts);
 			pfctl_clear_pool(&pr.rule.rpool);
 		}
 	}
 	return (0);
 }
 
 int
 pfctl_show_src_nodes(int dev, int opts)
 {
 	struct pfioc_src_nodes psn;
 	struct pf_src_node *p;
 	char *inbuf = NULL, *newinbuf = NULL;
 	unsigned int len = 0;
 	int i;
 
 	memset(&psn, 0, sizeof(psn));
 	for (;;) {
 		psn.psn_len = len;
 		if (len) {
 			newinbuf = realloc(inbuf, len);
 			if (newinbuf == NULL)
 				err(1, "realloc");
 			psn.psn_buf = inbuf = newinbuf;
 		}
 		if (ioctl(dev, DIOCGETSRCNODES, &psn) < 0) {
 			warn("DIOCGETSRCNODES");
 			free(inbuf);
 			return (-1);
 		}
 		if (psn.psn_len + sizeof(struct pfioc_src_nodes) < len)
 			break;
 		if (len == 0 && psn.psn_len == 0)
 			goto done;
 		if (len == 0 && psn.psn_len != 0)
 			len = psn.psn_len;
 		if (psn.psn_len == 0)
 			goto done;	/* no src_nodes */
 		len *= 2;
 	}
 	p = psn.psn_src_nodes;
 	if (psn.psn_len > 0 && (opts & PF_OPT_SHOWALL))
 		pfctl_print_title("SOURCE TRACKING NODES:");
 	for (i = 0; i < psn.psn_len; i += sizeof(*p)) {
 		print_src_node(p, opts);
 		p++;
 	}
 done:
 	free(inbuf);
 	return (0);
 }
 
 int
 pfctl_show_states(int dev, const char *iface, int opts)
 {
 	struct pfioc_states ps;
 	struct pfsync_state *p;
 	char *inbuf = NULL, *newinbuf = NULL;
 	unsigned int len = 0;
 	int i, dotitle = (opts & PF_OPT_SHOWALL);
 
 	memset(&ps, 0, sizeof(ps));
 	for (;;) {
 		ps.ps_len = len;
 		if (len) {
 			newinbuf = realloc(inbuf, len);
 			if (newinbuf == NULL)
 				err(1, "realloc");
 			ps.ps_buf = inbuf = newinbuf;
 		}
 		if (ioctl(dev, DIOCGETSTATES, &ps) < 0) {
 			warn("DIOCGETSTATES");
 			free(inbuf);
 			return (-1);
 		}
 		if (ps.ps_len + sizeof(struct pfioc_states) < len)
 			break;
 		if (len == 0 && ps.ps_len == 0)
 			goto done;
 		if (len == 0 && ps.ps_len != 0)
 			len = ps.ps_len;
 		if (ps.ps_len == 0)
 			goto done;	/* no states */
 		len *= 2;
 	}
 	p = ps.ps_states;
 	for (i = 0; i < ps.ps_len; i += sizeof(*p), p++) {
 		if (iface != NULL && strcmp(p->ifname, iface))
 			continue;
 		if (dotitle) {
 			pfctl_print_title("STATES:");
 			dotitle = 0;
 		}
 		print_state(p, opts);
 	}
 done:
 	free(inbuf);
 	return (0);
 }
 
 int
 pfctl_show_status(int dev, int opts)
 {
 	struct pf_status status;
 
 	if (ioctl(dev, DIOCGETSTATUS, &status)) {
 		warn("DIOCGETSTATUS");
 		return (-1);
 	}
 	if (opts & PF_OPT_SHOWALL)
 		pfctl_print_title("INFO:");
 	print_status(&status, opts);
 	return (0);
 }
 
 int
 pfctl_show_timeouts(int dev, int opts)
 {
 	struct pfioc_tm pt;
 	int i;
 
 	if (opts & PF_OPT_SHOWALL)
 		pfctl_print_title("TIMEOUTS:");
 	memset(&pt, 0, sizeof(pt));
 	for (i = 0; pf_timeouts[i].name; i++) {
 		pt.timeout = pf_timeouts[i].timeout;
 		if (ioctl(dev, DIOCGETTIMEOUT, &pt))
 			err(1, "DIOCGETTIMEOUT");
 		printf("%-20s %10d", pf_timeouts[i].name, pt.seconds);
 		if (pf_timeouts[i].timeout >= PFTM_ADAPTIVE_START &&
 		    pf_timeouts[i].timeout <= PFTM_ADAPTIVE_END)
 			printf(" states");
 		else
 			printf("s");
 		printf("\n");
 	}
 	return (0);
 
 }
 
 int
 pfctl_show_limits(int dev, int opts)
 {
 	struct pfioc_limit pl;
 	int i;
 
 	if (opts & PF_OPT_SHOWALL)
 		pfctl_print_title("LIMITS:");
 	memset(&pl, 0, sizeof(pl));
 	for (i = 0; pf_limits[i].name; i++) {
 		pl.index = pf_limits[i].index;
 		if (ioctl(dev, DIOCGETLIMIT, &pl))
 			err(1, "DIOCGETLIMIT");
 		printf("%-13s ", pf_limits[i].name);
 		if (pl.limit == UINT_MAX)
 			printf("unlimited\n");
 		else
 			printf("hard limit %8u\n", pl.limit);
 	}
 	return (0);
 }
 
 /* callbacks for rule/nat/rdr/addr */
 int
 pfctl_add_pool(struct pfctl *pf, struct pf_pool *p, sa_family_t af)
 {
 	struct pf_pooladdr *pa;
 
 	if ((pf->opts & PF_OPT_NOACTION) == 0) {
 		if (ioctl(pf->dev, DIOCBEGINADDRS, &pf->paddr))
 			err(1, "DIOCBEGINADDRS");
 	}
 
 	pf->paddr.af = af;
 	TAILQ_FOREACH(pa, &p->list, entries) {
 		memcpy(&pf->paddr.addr, pa, sizeof(struct pf_pooladdr));
 		if ((pf->opts & PF_OPT_NOACTION) == 0) {
 			if (ioctl(pf->dev, DIOCADDADDR, &pf->paddr))
 				err(1, "DIOCADDADDR");
 		}
 	}
 	return (0);
 }
 
 int
 pfctl_add_rule(struct pfctl *pf, struct pf_rule *r, const char *anchor_call)
 {
 	u_int8_t		rs_num;
 	struct pf_rule		*rule;
 	struct pf_ruleset	*rs;
 	char 			*p;
 
 	rs_num = pf_get_ruleset_number(r->action);
 	if (rs_num == PF_RULESET_MAX)
 		errx(1, "Invalid rule type %d", r->action);
 
 	rs = &pf->anchor->ruleset;
 
 	if (anchor_call[0] && r->anchor == NULL) {
 		/* 
 		 * Don't make non-brace anchors part of the main anchor pool.
 		 */
 		if ((r->anchor = calloc(1, sizeof(*r->anchor))) == NULL)
 			err(1, "pfctl_add_rule: calloc");
 		
 		pf_init_ruleset(&r->anchor->ruleset);
 		r->anchor->ruleset.anchor = r->anchor;
 		if (strlcpy(r->anchor->path, anchor_call,
 		    sizeof(rule->anchor->path)) >= sizeof(rule->anchor->path))
 			errx(1, "pfctl_add_rule: strlcpy");
 		if ((p = strrchr(anchor_call, '/')) != NULL) {
 			if (!strlen(p))
 				err(1, "pfctl_add_rule: bad anchor name %s",
 				    anchor_call);
 		} else
 			p = (char *)anchor_call;
 		if (strlcpy(r->anchor->name, p,
 		    sizeof(rule->anchor->name)) >= sizeof(rule->anchor->name))
 			errx(1, "pfctl_add_rule: strlcpy");
 	}
 
 	if ((rule = calloc(1, sizeof(*rule))) == NULL)
 		err(1, "calloc");
 	bcopy(r, rule, sizeof(*rule));
 	TAILQ_INIT(&rule->rpool.list);
 	pfctl_move_pool(&r->rpool, &rule->rpool);
 
 	TAILQ_INSERT_TAIL(rs->rules[rs_num].active.ptr, rule, entries);
 	return (0);
 }
 
 int
 pfctl_ruleset_trans(struct pfctl *pf, char *path, struct pf_anchor *a)
 {
 	int osize = pf->trans->pfrb_size;
 
 	if ((pf->loadopt & PFCTL_FLAG_NAT) != 0) {
 		if (pfctl_add_trans(pf->trans, PF_RULESET_NAT, path) ||
 		    pfctl_add_trans(pf->trans, PF_RULESET_BINAT, path) ||
 		    pfctl_add_trans(pf->trans, PF_RULESET_RDR, path))
 			return (1);
 	}
 	if (a == pf->astack[0] && ((altqsupport &&
 	    (pf->loadopt & PFCTL_FLAG_ALTQ) != 0))) {
 		if (pfctl_add_trans(pf->trans, PF_RULESET_ALTQ, path))
 			return (2);
 	}
 	if ((pf->loadopt & PFCTL_FLAG_FILTER) != 0) {
 		if (pfctl_add_trans(pf->trans, PF_RULESET_SCRUB, path) ||
 		    pfctl_add_trans(pf->trans, PF_RULESET_FILTER, path))
 			return (3);
 	}
 	if (pf->loadopt & PFCTL_FLAG_TABLE)
 		if (pfctl_add_trans(pf->trans, PF_RULESET_TABLE, path))
 			return (4);
 	if (pfctl_trans(pf->dev, pf->trans, DIOCXBEGIN, osize))
 		return (5);
 
 	return (0);
 }
 
 int
 pfctl_load_ruleset(struct pfctl *pf, char *path, struct pf_ruleset *rs,
     int rs_num, int depth)
 {
 	struct pf_rule *r;
 	int		error, len = strlen(path);
 	int		brace = 0;
 
 	pf->anchor = rs->anchor;
 
 	if (path[0])
 		snprintf(&path[len], MAXPATHLEN - len, "/%s", pf->anchor->name);
 	else
 		snprintf(&path[len], MAXPATHLEN - len, "%s", pf->anchor->name);
 
 	if (depth) {
 		if (TAILQ_FIRST(rs->rules[rs_num].active.ptr) != NULL) {
 			brace++;
 			if (pf->opts & PF_OPT_VERBOSE)
 				printf(" {\n");
 			if ((pf->opts & PF_OPT_NOACTION) == 0 &&
 			    (error = pfctl_ruleset_trans(pf,
 			    path, rs->anchor))) {
 				printf("pfctl_load_rulesets: "
 				    "pfctl_ruleset_trans %d\n", error);
 				goto error;
 			}
 		} else if (pf->opts & PF_OPT_VERBOSE)
 			printf("\n");
 
 	}
 
 	if (pf->optimize && rs_num == PF_RULESET_FILTER)
 		pfctl_optimize_ruleset(pf, rs);
 
 	while ((r = TAILQ_FIRST(rs->rules[rs_num].active.ptr)) != NULL) {
 		TAILQ_REMOVE(rs->rules[rs_num].active.ptr, r, entries);
 		if ((error = pfctl_load_rule(pf, path, r, depth)))
 			goto error;
 		if (r->anchor) {
 			if ((error = pfctl_load_ruleset(pf, path,
 			    &r->anchor->ruleset, rs_num, depth + 1)))
 				goto error;
 		} else if (pf->opts & PF_OPT_VERBOSE)
 			printf("\n");
 		free(r);
 	}
 	if (brace && pf->opts & PF_OPT_VERBOSE) {
 		INDENT(depth - 1, (pf->opts & PF_OPT_VERBOSE));
 		printf("}\n");
 	}
 	path[len] = '\0';
 	return (0);
 
  error:
 	path[len] = '\0';
 	return (error);
 
 }
 
 int
 pfctl_load_rule(struct pfctl *pf, char *path, struct pf_rule *r, int depth)
 {
 	u_int8_t		rs_num = pf_get_ruleset_number(r->action);
 	char			*name;
 	struct pfioc_rule	pr;
 	int			len = strlen(path);
 
 	bzero(&pr, sizeof(pr));
 	/* set up anchor before adding to path for anchor_call */
 	if ((pf->opts & PF_OPT_NOACTION) == 0)
 		pr.ticket = pfctl_get_ticket(pf->trans, rs_num, path);
 	if (strlcpy(pr.anchor, path, sizeof(pr.anchor)) >= sizeof(pr.anchor))
 		errx(1, "pfctl_load_rule: strlcpy");
 
 	if (r->anchor) {
 		if (r->anchor->match) {
 			if (path[0])
 				snprintf(&path[len], MAXPATHLEN - len,
 				    "/%s", r->anchor->name);
 			else
 				snprintf(&path[len], MAXPATHLEN - len,
 				    "%s", r->anchor->name);
 			name = path;
 		} else
 			name = r->anchor->path;
 	} else
 		name = "";
 
 	if ((pf->opts & PF_OPT_NOACTION) == 0) {
 		if (pfctl_add_pool(pf, &r->rpool, r->af))
 			return (1);
 		pr.pool_ticket = pf->paddr.ticket;
 		memcpy(&pr.rule, r, sizeof(pr.rule));
 		if (r->anchor && strlcpy(pr.anchor_call, name,
 		    sizeof(pr.anchor_call)) >= sizeof(pr.anchor_call))
 			errx(1, "pfctl_load_rule: strlcpy");
 		if (ioctl(pf->dev, DIOCADDRULE, &pr))
 			err(1, "DIOCADDRULE");
 	}
 
 	if (pf->opts & PF_OPT_VERBOSE) {
 		INDENT(depth, !(pf->opts & PF_OPT_VERBOSE2));
 		print_rule(r, r->anchor ? r->anchor->name : "",
 		    pf->opts & PF_OPT_VERBOSE2,
 		    pf->opts & PF_OPT_NUMERIC);
 	}
 	path[len] = '\0';
 	pfctl_clear_pool(&r->rpool);
 	return (0);
 }
 
 int
 pfctl_add_altq(struct pfctl *pf, struct pf_altq *a)
 {
 	if (altqsupport &&
 	    (loadopt & PFCTL_FLAG_ALTQ) != 0) {
 		memcpy(&pf->paltq->altq, a, sizeof(struct pf_altq));
 		if ((pf->opts & PF_OPT_NOACTION) == 0) {
 			if (ioctl(pf->dev, DIOCADDALTQ, pf->paltq)) {
 				if (errno == ENXIO)
 					errx(1, "qtype not configured");
 				else if (errno == ENODEV)
 					errx(1, "%s: driver does not support "
 					    "altq", a->ifname);
 				else
 					err(1, "DIOCADDALTQ");
 			}
 		}
 		pfaltq_store(&pf->paltq->altq);
 	}
 	return (0);
 }
 
 int
 pfctl_rules(int dev, char *filename, int opts, int optimize,
     char *anchorname, struct pfr_buffer *trans)
 {
 #define ERR(x) do { warn(x); goto _error; } while(0)
 #define ERRX(x) do { warnx(x); goto _error; } while(0)
 
 	struct pfr_buffer	*t, buf;
 	struct pfioc_altq	 pa;
 	struct pfctl		 pf;
 	struct pf_ruleset	*rs;
 	struct pfr_table	 trs;
 	char			*path;
 	int			 osize;
 
 	RB_INIT(&pf_anchors);
 	memset(&pf_main_anchor, 0, sizeof(pf_main_anchor));
 	pf_init_ruleset(&pf_main_anchor.ruleset);
 	pf_main_anchor.ruleset.anchor = &pf_main_anchor;
 	if (trans == NULL) {
 		bzero(&buf, sizeof(buf));
 		buf.pfrb_type = PFRB_TRANS;
 		t = &buf;
 		osize = 0;
 	} else {
 		t = trans;
 		osize = t->pfrb_size;
 	}
 
 	memset(&pa, 0, sizeof(pa));
 	memset(&pf, 0, sizeof(pf));
 	memset(&trs, 0, sizeof(trs));
 	if ((path = calloc(1, MAXPATHLEN)) == NULL)
 		ERRX("pfctl_rules: calloc");
 	if (strlcpy(trs.pfrt_anchor, anchorname,
 	    sizeof(trs.pfrt_anchor)) >= sizeof(trs.pfrt_anchor))
 		ERRX("pfctl_rules: strlcpy");
 	pf.dev = dev;
 	pf.opts = opts;
 	pf.optimize = optimize;
 	pf.loadopt = loadopt;
 
 	/* non-brace anchor, create without resolving the path */
 	if ((pf.anchor = calloc(1, sizeof(*pf.anchor))) == NULL)
 		ERRX("pfctl_rules: calloc");
 	rs = &pf.anchor->ruleset;
 	pf_init_ruleset(rs);
 	rs->anchor = pf.anchor;
 	if (strlcpy(pf.anchor->path, anchorname,
 	    sizeof(pf.anchor->path)) >= sizeof(pf.anchor->path))
 		errx(1, "pfctl_add_rule: strlcpy");
 	if (strlcpy(pf.anchor->name, anchorname,
 	    sizeof(pf.anchor->name)) >= sizeof(pf.anchor->name))
 		errx(1, "pfctl_add_rule: strlcpy");
 
 
 	pf.astack[0] = pf.anchor;
 	pf.asd = 0;
 	if (anchorname[0])
 		pf.loadopt &= ~PFCTL_FLAG_ALTQ;
 	pf.paltq = &pa;
 	pf.trans = t;
 	pfctl_init_options(&pf);
 
 	if ((opts & PF_OPT_NOACTION) == 0) {
 		/*
 		 * XXX For the time being we need to open transactions for
 		 * the main ruleset before parsing, because tables are still
 		 * loaded at parse time.
 		 */
 		if (pfctl_ruleset_trans(&pf, anchorname, pf.anchor))
 			ERRX("pfctl_rules");
 		if (altqsupport && (pf.loadopt & PFCTL_FLAG_ALTQ))
 			pa.ticket =
 			    pfctl_get_ticket(t, PF_RULESET_ALTQ, anchorname);
 		if (pf.loadopt & PFCTL_FLAG_TABLE)
 			pf.astack[0]->ruleset.tticket =
 			    pfctl_get_ticket(t, PF_RULESET_TABLE, anchorname);
 	}
 
 	if (parse_config(filename, &pf) < 0) {
 		if ((opts & PF_OPT_NOACTION) == 0)
 			ERRX("Syntax error in config file: "
 			    "pf rules not loaded");
 		else
 			goto _error;
 	}
 
 	if ((pf.loadopt & PFCTL_FLAG_FILTER &&
 	    (pfctl_load_ruleset(&pf, path, rs, PF_RULESET_SCRUB, 0))) ||
 	    (pf.loadopt & PFCTL_FLAG_NAT &&
 	    (pfctl_load_ruleset(&pf, path, rs, PF_RULESET_NAT, 0) ||
 	    pfctl_load_ruleset(&pf, path, rs, PF_RULESET_RDR, 0) ||
 	    pfctl_load_ruleset(&pf, path, rs, PF_RULESET_BINAT, 0))) ||
 	    (pf.loadopt & PFCTL_FLAG_FILTER &&
 	    pfctl_load_ruleset(&pf, path, rs, PF_RULESET_FILTER, 0))) {
 		if ((opts & PF_OPT_NOACTION) == 0)
 			ERRX("Unable to load rules into kernel");
 		else
 			goto _error;
 	}
 
 	if ((altqsupport && (pf.loadopt & PFCTL_FLAG_ALTQ) != 0))
 		if (check_commit_altq(dev, opts) != 0)
 			ERRX("errors in altq config");
 
 	/* process "load anchor" directives */
 	if (!anchorname[0])
 		if (pfctl_load_anchors(dev, &pf, t) == -1)
 			ERRX("load anchors");
 
 	if (trans == NULL && (opts & PF_OPT_NOACTION) == 0) {
 		if (!anchorname[0])
 			if (pfctl_load_options(&pf))
 				goto _error;
 		if (pfctl_trans(dev, t, DIOCXCOMMIT, osize))
 			ERR("DIOCXCOMMIT");
 	}
 	return (0);
 
 _error:
 	if (trans == NULL) {	/* main ruleset */
 		if ((opts & PF_OPT_NOACTION) == 0)
 			if (pfctl_trans(dev, t, DIOCXROLLBACK, osize))
 				err(1, "DIOCXROLLBACK");
 		exit(1);
 	} else {		/* sub ruleset */
 		return (-1);
 	}
 
 #undef ERR
 #undef ERRX
 }
 
 FILE *
 pfctl_fopen(const char *name, const char *mode)
 {
 	struct stat	 st;
 	FILE		*fp;
 
 	fp = fopen(name, mode);
 	if (fp == NULL)
 		return (NULL);
 	if (fstat(fileno(fp), &st)) {
 		fclose(fp);
 		return (NULL);
 	}
 	if (S_ISDIR(st.st_mode)) {
 		fclose(fp);
 		errno = EISDIR;
 		return (NULL);
 	}
 	return (fp);
 }
 
 void
 pfctl_init_options(struct pfctl *pf)
 {
 
 	pf->timeout[PFTM_TCP_FIRST_PACKET] = PFTM_TCP_FIRST_PACKET_VAL;
 	pf->timeout[PFTM_TCP_OPENING] = PFTM_TCP_OPENING_VAL;
 	pf->timeout[PFTM_TCP_ESTABLISHED] = PFTM_TCP_ESTABLISHED_VAL;
 	pf->timeout[PFTM_TCP_CLOSING] = PFTM_TCP_CLOSING_VAL;
 	pf->timeout[PFTM_TCP_FIN_WAIT] = PFTM_TCP_FIN_WAIT_VAL;
 	pf->timeout[PFTM_TCP_CLOSED] = PFTM_TCP_CLOSED_VAL;
 	pf->timeout[PFTM_UDP_FIRST_PACKET] = PFTM_UDP_FIRST_PACKET_VAL;
 	pf->timeout[PFTM_UDP_SINGLE] = PFTM_UDP_SINGLE_VAL;
 	pf->timeout[PFTM_UDP_MULTIPLE] = PFTM_UDP_MULTIPLE_VAL;
 	pf->timeout[PFTM_ICMP_FIRST_PACKET] = PFTM_ICMP_FIRST_PACKET_VAL;
 	pf->timeout[PFTM_ICMP_ERROR_REPLY] = PFTM_ICMP_ERROR_REPLY_VAL;
 	pf->timeout[PFTM_OTHER_FIRST_PACKET] = PFTM_OTHER_FIRST_PACKET_VAL;
 	pf->timeout[PFTM_OTHER_SINGLE] = PFTM_OTHER_SINGLE_VAL;
 	pf->timeout[PFTM_OTHER_MULTIPLE] = PFTM_OTHER_MULTIPLE_VAL;
 	pf->timeout[PFTM_FRAG] = PFTM_FRAG_VAL;
 	pf->timeout[PFTM_INTERVAL] = PFTM_INTERVAL_VAL;
 	pf->timeout[PFTM_SRC_NODE] = PFTM_SRC_NODE_VAL;
 	pf->timeout[PFTM_TS_DIFF] = PFTM_TS_DIFF_VAL;
 	pf->timeout[PFTM_ADAPTIVE_START] = PFSTATE_ADAPT_START;
 	pf->timeout[PFTM_ADAPTIVE_END] = PFSTATE_ADAPT_END;
 
 	pf->limit[PF_LIMIT_STATES] = PFSTATE_HIWAT;
 	pf->limit[PF_LIMIT_FRAGS] = PFFRAG_FRENT_HIWAT;
 	pf->limit[PF_LIMIT_SRC_NODES] = PFSNODE_HIWAT;
 	pf->limit[PF_LIMIT_TABLE_ENTRIES] = PFR_KENTRY_HIWAT;
 
 	pf->debug = PF_DEBUG_URGENT;
 }
 
 int
 pfctl_load_options(struct pfctl *pf)
 {
 	int i, error = 0;
 
 	if ((loadopt & PFCTL_FLAG_OPTION) == 0)
 		return (0);
 
 	/* load limits */
 	for (i = 0; i < PF_LIMIT_MAX; i++) {
 		if ((pf->opts & PF_OPT_MERGE) && !pf->limit_set[i])
 			continue;
 		if (pfctl_load_limit(pf, i, pf->limit[i]))
 			error = 1;
 	}
 
 	/*
 	 * If we've set the limit, but haven't explicitly set adaptive
 	 * timeouts, do it now with a start of 60% and end of 120%.
 	 */
 	if (pf->limit_set[PF_LIMIT_STATES] &&
 	    !pf->timeout_set[PFTM_ADAPTIVE_START] &&
 	    !pf->timeout_set[PFTM_ADAPTIVE_END]) {
 		pf->timeout[PFTM_ADAPTIVE_START] =
 			(pf->limit[PF_LIMIT_STATES] / 10) * 6;
 		pf->timeout_set[PFTM_ADAPTIVE_START] = 1;
 		pf->timeout[PFTM_ADAPTIVE_END] =
 			(pf->limit[PF_LIMIT_STATES] / 10) * 12;
 		pf->timeout_set[PFTM_ADAPTIVE_END] = 1;
 	}
 
 	/* load timeouts */
 	for (i = 0; i < PFTM_MAX; i++) {
 		if ((pf->opts & PF_OPT_MERGE) && !pf->timeout_set[i])
 			continue;
 		if (pfctl_load_timeout(pf, i, pf->timeout[i]))
 			error = 1;
 	}
 
 	/* load debug */
 	if (!(pf->opts & PF_OPT_MERGE) || pf->debug_set)
 		if (pfctl_load_debug(pf, pf->debug))
 			error = 1;
 
 	/* load logif */
 	if (!(pf->opts & PF_OPT_MERGE) || pf->ifname_set)
 		if (pfctl_load_logif(pf, pf->ifname))
 			error = 1;
 
 	/* load hostid */
 	if (!(pf->opts & PF_OPT_MERGE) || pf->hostid_set)
 		if (pfctl_load_hostid(pf, pf->hostid))
 			error = 1;
 
 	return (error);
 }
 
 int
 pfctl_set_limit(struct pfctl *pf, const char *opt, unsigned int limit)
 {
 	int i;
 
 
 	for (i = 0; pf_limits[i].name; i++) {
 		if (strcasecmp(opt, pf_limits[i].name) == 0) {
 			pf->limit[pf_limits[i].index] = limit;
 			pf->limit_set[pf_limits[i].index] = 1;
 			break;
 		}
 	}
 	if (pf_limits[i].name == NULL) {
 		warnx("Bad pool name.");
 		return (1);
 	}
 
 	if (pf->opts & PF_OPT_VERBOSE)
 		printf("set limit %s %d\n", opt, limit);
 
 	return (0);
 }
 
 int
 pfctl_load_limit(struct pfctl *pf, unsigned int index, unsigned int limit)
 {
 	struct pfioc_limit pl;
 
 	memset(&pl, 0, sizeof(pl));
 	pl.index = index;
 	pl.limit = limit;
 	if (ioctl(pf->dev, DIOCSETLIMIT, &pl)) {
 		if (errno == EBUSY)
 			warnx("Current pool size exceeds requested hard limit");
 		else
 			warnx("DIOCSETLIMIT");
 		return (1);
 	}
 	return (0);
 }
 
 int
 pfctl_set_timeout(struct pfctl *pf, const char *opt, int seconds, int quiet)
 {
 	int i;
 
 	if ((loadopt & PFCTL_FLAG_OPTION) == 0)
 		return (0);
 
 	for (i = 0; pf_timeouts[i].name; i++) {
 		if (strcasecmp(opt, pf_timeouts[i].name) == 0) {
 			pf->timeout[pf_timeouts[i].timeout] = seconds;
 			pf->timeout_set[pf_timeouts[i].timeout] = 1;
 			break;
 		}
 	}
 
 	if (pf_timeouts[i].name == NULL) {
 		warnx("Bad timeout name.");
 		return (1);
 	}
 
 
 	if (pf->opts & PF_OPT_VERBOSE && ! quiet)
 		printf("set timeout %s %d\n", opt, seconds);
 
 	return (0);
 }
 
 int
 pfctl_load_timeout(struct pfctl *pf, unsigned int timeout, unsigned int seconds)
 {
 	struct pfioc_tm pt;
 
 	memset(&pt, 0, sizeof(pt));
 	pt.timeout = timeout;
 	pt.seconds = seconds;
 	if (ioctl(pf->dev, DIOCSETTIMEOUT, &pt)) {
 		warnx("DIOCSETTIMEOUT");
 		return (1);
 	}
 	return (0);
 }
 
 int
 pfctl_set_optimization(struct pfctl *pf, const char *opt)
 {
 	const struct pf_hint *hint;
 	int i, r;
 
 	if ((loadopt & PFCTL_FLAG_OPTION) == 0)
 		return (0);
 
 	for (i = 0; pf_hints[i].name; i++)
 		if (strcasecmp(opt, pf_hints[i].name) == 0)
 			break;
 
 	hint = pf_hints[i].hint;
 	if (hint == NULL) {
 		warnx("invalid state timeouts optimization");
 		return (1);
 	}
 
 	for (i = 0; hint[i].name; i++)
 		if ((r = pfctl_set_timeout(pf, hint[i].name,
 		    hint[i].timeout, 1)))
 			return (r);
 
 	if (pf->opts & PF_OPT_VERBOSE)
 		printf("set optimization %s\n", opt);
 
 	return (0);
 }
 
 int
 pfctl_set_logif(struct pfctl *pf, char *ifname)
 {
 
 	if ((loadopt & PFCTL_FLAG_OPTION) == 0)
 		return (0);
 
 	if (!strcmp(ifname, "none")) {
 		free(pf->ifname);
 		pf->ifname = NULL;
 	} else {
 		pf->ifname = strdup(ifname);
 		if (!pf->ifname)
 			errx(1, "pfctl_set_logif: strdup");
 	}
 	pf->ifname_set = 1;
 
 	if (pf->opts & PF_OPT_VERBOSE)
 		printf("set loginterface %s\n", ifname);
 
 	return (0);
 }
 
 int
 pfctl_load_logif(struct pfctl *pf, char *ifname)
 {
 	struct pfioc_if pi;
 
 	memset(&pi, 0, sizeof(pi));
 	if (ifname && strlcpy(pi.ifname, ifname,
 	    sizeof(pi.ifname)) >= sizeof(pi.ifname)) {
 		warnx("pfctl_load_logif: strlcpy");
 		return (1);
 	}
 	if (ioctl(pf->dev, DIOCSETSTATUSIF, &pi)) {
 		warnx("DIOCSETSTATUSIF");
 		return (1);
 	}
 	return (0);
 }
 
 int
 pfctl_set_hostid(struct pfctl *pf, u_int32_t hostid)
 {
 	if ((loadopt & PFCTL_FLAG_OPTION) == 0)
 		return (0);
 
 	HTONL(hostid);
 
 	pf->hostid = hostid;
 	pf->hostid_set = 1;
 
 	if (pf->opts & PF_OPT_VERBOSE)
 		printf("set hostid 0x%08x\n", ntohl(hostid));
 
 	return (0);
 }
 
 int
 pfctl_load_hostid(struct pfctl *pf, u_int32_t hostid)
 {
 	if (ioctl(dev, DIOCSETHOSTID, &hostid)) {
 		warnx("DIOCSETHOSTID");
 		return (1);
 	}
 	return (0);
 }
 
 int
 pfctl_set_debug(struct pfctl *pf, char *d)
 {
 	u_int32_t	level;
 
 	if ((loadopt & PFCTL_FLAG_OPTION) == 0)
 		return (0);
 
 	if (!strcmp(d, "none"))
 		pf->debug = PF_DEBUG_NONE;
 	else if (!strcmp(d, "urgent"))
 		pf->debug = PF_DEBUG_URGENT;
 	else if (!strcmp(d, "misc"))
 		pf->debug = PF_DEBUG_MISC;
 	else if (!strcmp(d, "loud"))
 		pf->debug = PF_DEBUG_NOISY;
 	else {
 		warnx("unknown debug level \"%s\"", d);
 		return (-1);
 	}
 
 	pf->debug_set = 1;
 	level = pf->debug;
 
 	if ((pf->opts & PF_OPT_NOACTION) == 0)
 		if (ioctl(dev, DIOCSETDEBUG, &level))
 			err(1, "DIOCSETDEBUG");
 
 	if (pf->opts & PF_OPT_VERBOSE)
 		printf("set debug %s\n", d);
 
 	return (0);
 }
 
 int
 pfctl_load_debug(struct pfctl *pf, unsigned int level)
 {
 	if (ioctl(pf->dev, DIOCSETDEBUG, &level)) {
 		warnx("DIOCSETDEBUG");
 		return (1);
 	}
 	return (0);
 }
 
 int
 pfctl_set_interface_flags(struct pfctl *pf, char *ifname, int flags, int how)
 {
 	struct pfioc_iface	pi;
 
 	if ((loadopt & PFCTL_FLAG_OPTION) == 0)
 		return (0);
 
 	bzero(&pi, sizeof(pi));
 
 	pi.pfiio_flags = flags;
 
 	if (strlcpy(pi.pfiio_name, ifname, sizeof(pi.pfiio_name)) >=
 	    sizeof(pi.pfiio_name))
 		errx(1, "pfctl_set_interface_flags: strlcpy");
 
 	if ((pf->opts & PF_OPT_NOACTION) == 0) {
 		if (how == 0) {
 			if (ioctl(pf->dev, DIOCCLRIFFLAG, &pi))
 				err(1, "DIOCCLRIFFLAG");
 		} else {
 			if (ioctl(pf->dev, DIOCSETIFFLAG, &pi))
 				err(1, "DIOCSETIFFLAG");
 		}
 	}
 	return (0);
 }
 
 void
 pfctl_debug(int dev, u_int32_t level, int opts)
 {
 	if (ioctl(dev, DIOCSETDEBUG, &level))
 		err(1, "DIOCSETDEBUG");
 	if ((opts & PF_OPT_QUIET) == 0) {
 		fprintf(stderr, "debug level set to '");
 		switch (level) {
 		case PF_DEBUG_NONE:
 			fprintf(stderr, "none");
 			break;
 		case PF_DEBUG_URGENT:
 			fprintf(stderr, "urgent");
 			break;
 		case PF_DEBUG_MISC:
 			fprintf(stderr, "misc");
 			break;
 		case PF_DEBUG_NOISY:
 			fprintf(stderr, "loud");
 			break;
 		default:
 			fprintf(stderr, "<invalid>");
 			break;
 		}
 		fprintf(stderr, "'\n");
 	}
 }
 
 int
 pfctl_test_altqsupport(int dev, int opts)
 {
 	struct pfioc_altq pa;
 
 	if (ioctl(dev, DIOCGETALTQS, &pa)) {
 		if (errno == ENODEV) {
 			if (opts & PF_OPT_VERBOSE)
 				fprintf(stderr, "No ALTQ support in kernel\n"
 				    "ALTQ related functions disabled\n");
 			return (0);
 		} else
 			err(1, "DIOCGETALTQS");
 	}
 	return (1);
 }
 
 int
 pfctl_show_anchors(int dev, int opts, char *anchorname)
 {
 	struct pfioc_ruleset	 pr;
 	u_int32_t		 mnr, nr;
 
 	memset(&pr, 0, sizeof(pr));
 	memcpy(pr.path, anchorname, sizeof(pr.path));
 	if (ioctl(dev, DIOCGETRULESETS, &pr)) {
 		if (errno == EINVAL)
 			fprintf(stderr, "Anchor '%s' not found.\n",
 			    anchorname);
 		else
 			err(1, "DIOCGETRULESETS");
 		return (-1);
 	}
 	mnr = pr.nr;
 	for (nr = 0; nr < mnr; ++nr) {
 		char sub[MAXPATHLEN];
 
 		pr.nr = nr;
 		if (ioctl(dev, DIOCGETRULESET, &pr))
 			err(1, "DIOCGETRULESET");
 		if (!strcmp(pr.name, PF_RESERVED_ANCHOR))
 			continue;
 		sub[0] = 0;
 		if (pr.path[0]) {
 			strlcat(sub, pr.path, sizeof(sub));
 			strlcat(sub, "/", sizeof(sub));
 		}
 		strlcat(sub, pr.name, sizeof(sub));
 		if (sub[0] != '_' || (opts & PF_OPT_VERBOSE))
 			printf("  %s\n", sub);
 		if ((opts & PF_OPT_VERBOSE) && pfctl_show_anchors(dev, opts, sub))
 			return (-1);
 	}
 	return (0);
 }
 
 const char *
 pfctl_lookup_option(char *cmd, const char * const *list)
 {
 	if (cmd != NULL && *cmd)
 		for (; *list; list++)
 			if (!strncmp(cmd, *list, strlen(cmd)))
 				return (*list);
 	return (NULL);
 }
 
 int
 main(int argc, char *argv[])
 {
 	int	 error = 0;
 	int	 ch;
 	int	 mode = O_RDONLY;
 	int	 opts = 0;
 	int	 optimize = PF_OPTIMIZE_BASIC;
 	char	 anchorname[MAXPATHLEN];
 	char	*path;
 
 	if (argc < 2)
 		usage();
 
 	while ((ch = getopt(argc, argv,
 	    "a:AdD:eqf:F:ghi:k:K:mnNOo:Pp:rRs:t:T:vx:z")) != -1) {
 		switch (ch) {
 		case 'a':
 			anchoropt = optarg;
 			break;
 		case 'd':
 			opts |= PF_OPT_DISABLE;
 			mode = O_RDWR;
 			break;
 		case 'D':
 			if (pfctl_cmdline_symset(optarg) < 0)
 				warnx("could not parse macro definition %s",
 				    optarg);
 			break;
 		case 'e':
 			opts |= PF_OPT_ENABLE;
 			mode = O_RDWR;
 			break;
 		case 'q':
 			opts |= PF_OPT_QUIET;
 			break;
 		case 'F':
 			clearopt = pfctl_lookup_option(optarg, clearopt_list);
 			if (clearopt == NULL) {
 				warnx("Unknown flush modifier '%s'", optarg);
 				usage();
 			}
 			mode = O_RDWR;
 			break;
 		case 'i':
 			ifaceopt = optarg;
 			break;
 		case 'k':
 			if (state_killers >= 2) {
 				warnx("can only specify -k twice");
 				usage();
 				/* NOTREACHED */
 			}
 			state_kill[state_killers++] = optarg;
 			mode = O_RDWR;
 			break;
 		case 'K':
 			if (src_node_killers >= 2) {
 				warnx("can only specify -K twice");
 				usage();
 				/* NOTREACHED */
 			}
 			src_node_kill[src_node_killers++] = optarg;
 			mode = O_RDWR;
 			break;
 		case 'm':
 			opts |= PF_OPT_MERGE;
 			break;
 		case 'n':
 			opts |= PF_OPT_NOACTION;
 			break;
 		case 'N':
 			loadopt |= PFCTL_FLAG_NAT;
 			break;
 		case 'r':
 			opts |= PF_OPT_USEDNS;
 			break;
 		case 'f':
 			rulesopt = optarg;
 			mode = O_RDWR;
 			break;
 		case 'g':
 			opts |= PF_OPT_DEBUG;
 			break;
 		case 'A':
 			loadopt |= PFCTL_FLAG_ALTQ;
 			break;
 		case 'R':
 			loadopt |= PFCTL_FLAG_FILTER;
 			break;
 		case 'o':
 			optiopt = pfctl_lookup_option(optarg, optiopt_list);
 			if (optiopt == NULL) {
 				warnx("Unknown optimization '%s'", optarg);
 				usage();
 			}
 			opts |= PF_OPT_OPTIMIZE;
 			break;
 		case 'O':
 			loadopt |= PFCTL_FLAG_OPTION;
 			break;
 		case 'p':
 			pf_device = optarg;
 			break;
 		case 'P':
 			opts |= PF_OPT_NUMERIC;
 			break;
 		case 's':
 			showopt = pfctl_lookup_option(optarg, showopt_list);
 			if (showopt == NULL) {
 				warnx("Unknown show modifier '%s'", optarg);
 				usage();
 			}
 			break;
 		case 't':
 			tableopt = optarg;
 			break;
 		case 'T':
 			tblcmdopt = pfctl_lookup_option(optarg, tblcmdopt_list);
 			if (tblcmdopt == NULL) {
 				warnx("Unknown table command '%s'", optarg);
 				usage();
 			}
 			break;
 		case 'v':
 			if (opts & PF_OPT_VERBOSE)
 				opts |= PF_OPT_VERBOSE2;
 			opts |= PF_OPT_VERBOSE;
 			break;
 		case 'x':
 			debugopt = pfctl_lookup_option(optarg, debugopt_list);
 			if (debugopt == NULL) {
 				warnx("Unknown debug level '%s'", optarg);
 				usage();
 			}
 			mode = O_RDWR;
 			break;
 		case 'z':
 			opts |= PF_OPT_CLRRULECTRS;
 			mode = O_RDWR;
 			break;
 		case 'h':
 			/* FALLTHROUGH */
 		default:
 			usage();
 			/* NOTREACHED */
 		}
 	}
 
 	if (tblcmdopt != NULL) {
 		argc -= optind;
 		argv += optind;
 		ch = *tblcmdopt;
 		if (ch == 'l') {
 			loadopt |= PFCTL_FLAG_TABLE;
 			tblcmdopt = NULL;
 		} else
 			mode = strchr("acdefkrz", ch) ? O_RDWR : O_RDONLY;
 	} else if (argc != optind) {
 		warnx("unknown command line argument: %s ...", argv[optind]);
 		usage();
 		/* NOTREACHED */
 	}
 	if (loadopt == 0)
 		loadopt = ~0;
 
 	if ((path = calloc(1, MAXPATHLEN)) == NULL)
 		errx(1, "pfctl: calloc");
 	memset(anchorname, 0, sizeof(anchorname));
 	if (anchoropt != NULL) {
 		int len = strlen(anchoropt);
 
 		if (anchoropt[len - 1] == '*') {
 			if (len >= 2 && anchoropt[len - 2] == '/')
 				anchoropt[len - 2] = '\0';
 			else
 				anchoropt[len - 1] = '\0';
 			opts |= PF_OPT_RECURSE;
 		}
 		if (strlcpy(anchorname, anchoropt,
 		    sizeof(anchorname)) >= sizeof(anchorname))
 			errx(1, "anchor name '%s' too long",
 			    anchoropt);
 		loadopt &= PFCTL_FLAG_FILTER|PFCTL_FLAG_NAT|PFCTL_FLAG_TABLE;
 	}
 
 	if ((opts & PF_OPT_NOACTION) == 0) {
 		dev = open(pf_device, mode);
 		if (dev == -1)
 			err(1, "%s", pf_device);
 		altqsupport = pfctl_test_altqsupport(dev, opts);
 	} else {
 		dev = open(pf_device, O_RDONLY);
 		if (dev >= 0)
 			opts |= PF_OPT_DUMMYACTION;
 		/* turn off options */
 		opts &= ~ (PF_OPT_DISABLE | PF_OPT_ENABLE);
 		clearopt = showopt = debugopt = NULL;
 #if !defined(ENABLE_ALTQ)
 		altqsupport = 0;
 #else
 		altqsupport = 1;
 #endif
 	}
 
 	if (opts & PF_OPT_DISABLE)
 		if (pfctl_disable(dev, opts))
 			error = 1;
 
 	if (showopt != NULL) {
 		switch (*showopt) {
 		case 'A':
 			pfctl_show_anchors(dev, opts, anchorname);
 			break;
 		case 'r':
 			pfctl_load_fingerprints(dev, opts);
 			pfctl_show_rules(dev, path, opts, PFCTL_SHOW_RULES,
 			    anchorname, 0);
 			break;
 		case 'l':
 			pfctl_load_fingerprints(dev, opts);
 			pfctl_show_rules(dev, path, opts, PFCTL_SHOW_LABELS,
 			    anchorname, 0);
 			break;
 		case 'n':
 			pfctl_load_fingerprints(dev, opts);
 			pfctl_show_nat(dev, opts, anchorname);
 			break;
 		case 'q':
 			pfctl_show_altq(dev, ifaceopt, opts,
 			    opts & PF_OPT_VERBOSE2);
 			break;
 		case 's':
 			pfctl_show_states(dev, ifaceopt, opts);
 			break;
 		case 'S':
 			pfctl_show_src_nodes(dev, opts);
 			break;
 		case 'i':
 			pfctl_show_status(dev, opts);
 			break;
 		case 't':
 			pfctl_show_timeouts(dev, opts);
 			break;
 		case 'm':
 			pfctl_show_limits(dev, opts);
 			break;
 		case 'a':
 			opts |= PF_OPT_SHOWALL;
 			pfctl_load_fingerprints(dev, opts);
 
 			pfctl_show_nat(dev, opts, anchorname);
 			pfctl_show_rules(dev, path, opts, 0, anchorname, 0);
 			pfctl_show_altq(dev, ifaceopt, opts, 0);
 			pfctl_show_states(dev, ifaceopt, opts);
 			pfctl_show_src_nodes(dev, opts);
 			pfctl_show_status(dev, opts);
 			pfctl_show_rules(dev, path, opts, 1, anchorname, 0);
 			pfctl_show_timeouts(dev, opts);
 			pfctl_show_limits(dev, opts);
 			pfctl_show_tables(anchorname, opts);
 			pfctl_show_fingerprints(opts);
 			break;
 		case 'T':
 			pfctl_show_tables(anchorname, opts);
 			break;
 		case 'o':
 			pfctl_load_fingerprints(dev, opts);
 			pfctl_show_fingerprints(opts);
 			break;
 		case 'I':
 			pfctl_show_ifaces(ifaceopt, opts);
 			break;
 		}
 	}
 
 	if ((opts & PF_OPT_CLRRULECTRS) && showopt == NULL)
 		pfctl_show_rules(dev, path, opts, PFCTL_SHOW_NOTHING,
 		    anchorname, 0);
 
 	if (clearopt != NULL) {
 		if (anchorname[0] == '_' || strstr(anchorname, "/_") != NULL)
 			errx(1, "anchor names beginning with '_' cannot "
 			    "be modified from the command line");
 
 		switch (*clearopt) {
 		case 'r':
 			pfctl_clear_rules(dev, opts, anchorname);
 			break;
 		case 'n':
 			pfctl_clear_nat(dev, opts, anchorname);
 			break;
 		case 'q':
 			pfctl_clear_altq(dev, opts);
 			break;
 		case 's':
 			pfctl_clear_states(dev, ifaceopt, opts);
 			break;
 		case 'S':
 			pfctl_clear_src_nodes(dev, opts);
 			break;
 		case 'i':
 			pfctl_clear_stats(dev, opts);
 			break;
 		case 'a':
 			pfctl_clear_rules(dev, opts, anchorname);
 			pfctl_clear_nat(dev, opts, anchorname);
 			pfctl_clear_tables(anchorname, opts);
 			if (!*anchorname) {
 				pfctl_clear_altq(dev, opts);
 				pfctl_clear_states(dev, ifaceopt, opts);
 				pfctl_clear_src_nodes(dev, opts);
 				pfctl_clear_stats(dev, opts);
 				pfctl_clear_fingerprints(dev, opts);
 				pfctl_clear_interface_flags(dev, opts);
 			}
 			break;
 		case 'o':
 			pfctl_clear_fingerprints(dev, opts);
 			break;
 		case 'T':
 			pfctl_clear_tables(anchorname, opts);
 			break;
 		}
 	}
 	if (state_killers) {
 		if (!strcmp(state_kill[0], "label"))
 			pfctl_label_kill_states(dev, ifaceopt, opts);
 		else if (!strcmp(state_kill[0], "id"))
 			pfctl_id_kill_states(dev, ifaceopt, opts);
 		else
 			pfctl_net_kill_states(dev, ifaceopt, opts);
 	}
 
 	if (src_node_killers)
 		pfctl_kill_src_nodes(dev, ifaceopt, opts);
 
 	if (tblcmdopt != NULL) {
 		error = pfctl_command_tables(argc, argv, tableopt,
 		    tblcmdopt, rulesopt, anchorname, opts);
 		rulesopt = NULL;
 	}
 	if (optiopt != NULL) {
 		switch (*optiopt) {
 		case 'n':
 			optimize = 0;
 			break;
 		case 'b':
 			optimize |= PF_OPTIMIZE_BASIC;
 			break;
 		case 'o':
 		case 'p':
 			optimize |= PF_OPTIMIZE_PROFILE;
 			break;
 		}
 	}
 
 	if ((rulesopt != NULL) && (loadopt & PFCTL_FLAG_OPTION) &&
 	    !anchorname[0])
 		if (pfctl_clear_interface_flags(dev, opts | PF_OPT_QUIET))
 			error = 1;
 
 	if (rulesopt != NULL && !(opts & (PF_OPT_MERGE|PF_OPT_NOACTION)) &&
 	    !anchorname[0] && (loadopt & PFCTL_FLAG_OPTION))
 		if (pfctl_file_fingerprints(dev, opts, PF_OSFP_FILE))
 			error = 1;
 
 	if (rulesopt != NULL) {
 		if (anchorname[0] == '_' || strstr(anchorname, "/_") != NULL)
 			errx(1, "anchor names beginning with '_' cannot "
 			    "be modified from the command line");
 		if (pfctl_rules(dev, rulesopt, opts, optimize,
 		    anchorname, NULL))
 			error = 1;
 		else if (!(opts & PF_OPT_NOACTION) &&
 		    (loadopt & PFCTL_FLAG_TABLE))
 			warn_namespace_collision(NULL);
 	}
 
 	if (opts & PF_OPT_ENABLE)
 		if (pfctl_enable(dev, opts))
 			error = 1;
 
 	if (debugopt != NULL) {
 		switch (*debugopt) {
 		case 'n':
 			pfctl_debug(dev, PF_DEBUG_NONE, opts);
 			break;
 		case 'u':
 			pfctl_debug(dev, PF_DEBUG_URGENT, opts);
 			break;
 		case 'm':
 			pfctl_debug(dev, PF_DEBUG_MISC, opts);
 			break;
 		case 'l':
 			pfctl_debug(dev, PF_DEBUG_NOISY, opts);
 			break;
 		}
 	}
 
 	exit(error);
 }
Index: user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_altq.c
===================================================================
--- user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_altq.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_altq.c	(revision 303775)
@@ -1,1518 +1,1518 @@
 /*	$OpenBSD: pfctl_altq.c,v 1.93 2007/10/15 02:16:35 deraadt Exp $	*/
 
 /*
  * Copyright (c) 2002
  *	Sony Computer Science Laboratories Inc.
  * Copyright (c) 2002, 2003 Henning Brauer <henning@openbsd.org>
  *
  * Permission to use, copy, modify, and distribute this software for any
  * purpose with or without fee is hereby granted, provided that the above
  * copyright notice and this permission notice appear in all copies.
  *
  * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
  * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
  * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
  * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
  * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/types.h>
 #include <sys/ioctl.h>
 #include <sys/socket.h>
 
 #include <net/if.h>
 #include <netinet/in.h>
 #include <net/pfvar.h>
 
 #include <err.h>
 #include <errno.h>
 #include <limits.h>
 #include <math.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <unistd.h>
 
 #include <net/altq/altq.h>
 #include <net/altq/altq_cbq.h>
 #include <net/altq/altq_codel.h>
 #include <net/altq/altq_priq.h>
 #include <net/altq/altq_hfsc.h>
 #include <net/altq/altq_fairq.h>
 
 #include "pfctl_parser.h"
 #include "pfctl.h"
 
 #define is_sc_null(sc)	(((sc) == NULL) || ((sc)->m1 == 0 && (sc)->m2 == 0))
 
-TAILQ_HEAD(altqs, pf_altq) altqs = TAILQ_HEAD_INITIALIZER(altqs);
-LIST_HEAD(gen_sc, segment) rtsc, lssc;
+static TAILQ_HEAD(altqs, pf_altq) altqs = TAILQ_HEAD_INITIALIZER(altqs);
+static LIST_HEAD(gen_sc, segment) rtsc, lssc;
 
 struct pf_altq	*qname_to_pfaltq(const char *, const char *);
 u_int32_t	 qname_to_qid(const char *);
 
 static int	eval_pfqueue_cbq(struct pfctl *, struct pf_altq *);
 static int	cbq_compute_idletime(struct pfctl *, struct pf_altq *);
 static int	check_commit_cbq(int, int, struct pf_altq *);
 static int	print_cbq_opts(const struct pf_altq *);
 
 static int	print_codel_opts(const struct pf_altq *,
 		    const struct node_queue_opt *);
 
 static int	eval_pfqueue_priq(struct pfctl *, struct pf_altq *);
 static int	check_commit_priq(int, int, struct pf_altq *);
 static int	print_priq_opts(const struct pf_altq *);
 
 static int	eval_pfqueue_hfsc(struct pfctl *, struct pf_altq *);
 static int	check_commit_hfsc(int, int, struct pf_altq *);
 static int	print_hfsc_opts(const struct pf_altq *,
 		    const struct node_queue_opt *);
 
 static int	eval_pfqueue_fairq(struct pfctl *, struct pf_altq *);
 static int	print_fairq_opts(const struct pf_altq *,
 		    const struct node_queue_opt *);
 static int	check_commit_fairq(int, int, struct pf_altq *);
 
 static void		 gsc_add_sc(struct gen_sc *, struct service_curve *);
 static int		 is_gsc_under_sc(struct gen_sc *,
 			     struct service_curve *);
 static void		 gsc_destroy(struct gen_sc *);
 static struct segment	*gsc_getentry(struct gen_sc *, double);
 static int		 gsc_add_seg(struct gen_sc *, double, double, double,
 			     double);
 static double		 sc_x2y(struct service_curve *, double);
 
 #ifdef __FreeBSD__
 u_int32_t	getifspeed(int, char *);
 #else
 u_int32_t	 getifspeed(char *);
 #endif
 u_long		 getifmtu(char *);
 int		 eval_queue_opts(struct pf_altq *, struct node_queue_opt *,
 		     u_int32_t);
 u_int32_t	 eval_bwspec(struct node_queue_bw *, u_int32_t);
 void		 print_hfsc_sc(const char *, u_int, u_int, u_int,
 		     const struct node_hfsc_sc *);
 void		 print_fairq_sc(const char *, u_int, u_int, u_int,
 		     const struct node_fairq_sc *);
 
 void
 pfaltq_store(struct pf_altq *a)
 {
 	struct pf_altq	*altq;
 
 	if ((altq = malloc(sizeof(*altq))) == NULL)
 		err(1, "malloc");
 	memcpy(altq, a, sizeof(struct pf_altq));
 	TAILQ_INSERT_TAIL(&altqs, altq, entries);
 }
 
 struct pf_altq *
 pfaltq_lookup(const char *ifname)
 {
 	struct pf_altq	*altq;
 
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (strncmp(ifname, altq->ifname, IFNAMSIZ) == 0 &&
 		    altq->qname[0] == 0)
 			return (altq);
 	}
 	return (NULL);
 }
 
 struct pf_altq *
 qname_to_pfaltq(const char *qname, const char *ifname)
 {
 	struct pf_altq	*altq;
 
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (strncmp(ifname, altq->ifname, IFNAMSIZ) == 0 &&
 		    strncmp(qname, altq->qname, PF_QNAME_SIZE) == 0)
 			return (altq);
 	}
 	return (NULL);
 }
 
 u_int32_t
 qname_to_qid(const char *qname)
 {
 	struct pf_altq	*altq;
 
 	/*
 	 * We guarantee that same named queues on different interfaces
 	 * have the same qid, so we do NOT need to limit matching on
 	 * one interface!
 	 */
 
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (strncmp(qname, altq->qname, PF_QNAME_SIZE) == 0)
 			return (altq->qid);
 	}
 	return (0);
 }
 
 void
 print_altq(const struct pf_altq *a, unsigned int level,
     struct node_queue_bw *bw, struct node_queue_opt *qopts)
 {
 	if (a->qname[0] != 0) {
 		print_queue(a, level, bw, 1, qopts);
 		return;
 	}
 
 #ifdef __FreeBSD__
 	if (a->local_flags & PFALTQ_FLAG_IF_REMOVED)
 		printf("INACTIVE ");
 #endif
 
 	printf("altq on %s ", a->ifname);
 
 	switch (a->scheduler) {
 	case ALTQT_CBQ:
 		if (!print_cbq_opts(a))
 			printf("cbq ");
 		break;
 	case ALTQT_PRIQ:
 		if (!print_priq_opts(a))
 			printf("priq ");
 		break;
 	case ALTQT_HFSC:
 		if (!print_hfsc_opts(a, qopts))
 			printf("hfsc ");
 		break;
 	case ALTQT_FAIRQ:
 		if (!print_fairq_opts(a, qopts))
 			printf("fairq ");
 		break;
 	case ALTQT_CODEL:
 		if (!print_codel_opts(a, qopts))
 			printf("codel ");
 		break;
 	}
 
 	if (bw != NULL && bw->bw_percent > 0) {
 		if (bw->bw_percent < 100)
 			printf("bandwidth %u%% ", bw->bw_percent);
 	} else
 		printf("bandwidth %s ", rate2str((double)a->ifbandwidth));
 
 	if (a->qlimit != DEFAULT_QLIMIT)
 		printf("qlimit %u ", a->qlimit);
 	printf("tbrsize %u ", a->tbrsize);
 }
 
 void
 print_queue(const struct pf_altq *a, unsigned int level,
     struct node_queue_bw *bw, int print_interface,
     struct node_queue_opt *qopts)
 {
 	unsigned int	i;
 
 #ifdef __FreeBSD__
 	if (a->local_flags & PFALTQ_FLAG_IF_REMOVED)
 		printf("INACTIVE ");
 #endif
 	printf("queue ");
 	for (i = 0; i < level; ++i)
 		printf(" ");
 	printf("%s ", a->qname);
 	if (print_interface)
 		printf("on %s ", a->ifname);
 	if (a->scheduler == ALTQT_CBQ || a->scheduler == ALTQT_HFSC ||
 		a->scheduler == ALTQT_FAIRQ) {
 		if (bw != NULL && bw->bw_percent > 0) {
 			if (bw->bw_percent < 100)
 				printf("bandwidth %u%% ", bw->bw_percent);
 		} else
 			printf("bandwidth %s ", rate2str((double)a->bandwidth));
 	}
 	if (a->priority != DEFAULT_PRIORITY)
 		printf("priority %u ", a->priority);
 	if (a->qlimit != DEFAULT_QLIMIT)
 		printf("qlimit %u ", a->qlimit);
 	switch (a->scheduler) {
 	case ALTQT_CBQ:
 		print_cbq_opts(a);
 		break;
 	case ALTQT_PRIQ:
 		print_priq_opts(a);
 		break;
 	case ALTQT_HFSC:
 		print_hfsc_opts(a, qopts);
 		break;
 	case ALTQT_FAIRQ:
 		print_fairq_opts(a, qopts);
 		break;
 	}
 }
 
 /*
  * eval_pfaltq computes the discipline parameters.
  */
 int
 eval_pfaltq(struct pfctl *pf, struct pf_altq *pa, struct node_queue_bw *bw,
     struct node_queue_opt *opts)
 {
 	u_int	rate, size, errors = 0;
 
 	if (bw->bw_absolute > 0)
 		pa->ifbandwidth = bw->bw_absolute;
 	else
 #ifdef __FreeBSD__
 		if ((rate = getifspeed(pf->dev, pa->ifname)) == 0) {
 #else
 		if ((rate = getifspeed(pa->ifname)) == 0) {
 #endif
 			fprintf(stderr, "interface %s does not know its bandwidth, "
 			    "please specify an absolute bandwidth\n",
 			    pa->ifname);
 			errors++;
 		} else if ((pa->ifbandwidth = eval_bwspec(bw, rate)) == 0)
 			pa->ifbandwidth = rate;
 
 	errors += eval_queue_opts(pa, opts, pa->ifbandwidth);
 
 	/* if tbrsize is not specified, use heuristics */
 	if (pa->tbrsize == 0) {
 		rate = pa->ifbandwidth;
 		if (rate <= 1 * 1000 * 1000)
 			size = 1;
 		else if (rate <= 10 * 1000 * 1000)
 			size = 4;
 		else if (rate <= 200 * 1000 * 1000)
 			size = 8;
 		else
 			size = 24;
 		size = size * getifmtu(pa->ifname);
 		if (size > 0xffff)
 			size = 0xffff;
 		pa->tbrsize = size;
 	}
 	return (errors);
 }
 
 /*
  * check_commit_altq does consistency check for each interface
  */
 int
 check_commit_altq(int dev, int opts)
 {
 	struct pf_altq	*altq;
 	int		 error = 0;
 
 	/* call the discipline check for each interface. */
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (altq->qname[0] == 0) {
 			switch (altq->scheduler) {
 			case ALTQT_CBQ:
 				error = check_commit_cbq(dev, opts, altq);
 				break;
 			case ALTQT_PRIQ:
 				error = check_commit_priq(dev, opts, altq);
 				break;
 			case ALTQT_HFSC:
 				error = check_commit_hfsc(dev, opts, altq);
 				break;
 			case ALTQT_FAIRQ:
 				error = check_commit_fairq(dev, opts, altq);
 				break;
 			default:
 				break;
 			}
 		}
 	}
 	return (error);
 }
 
 /*
  * eval_pfqueue computes the queue parameters.
  */
 int
 eval_pfqueue(struct pfctl *pf, struct pf_altq *pa, struct node_queue_bw *bw,
     struct node_queue_opt *opts)
 {
 	/* should be merged with expand_queue */
 	struct pf_altq	*if_pa, *parent, *altq;
 	u_int32_t	 bwsum;
 	int		 error = 0;
 
 	/* find the corresponding interface and copy fields used by queues */
 	if ((if_pa = pfaltq_lookup(pa->ifname)) == NULL) {
 		fprintf(stderr, "altq not defined on %s\n", pa->ifname);
 		return (1);
 	}
 	pa->scheduler = if_pa->scheduler;
 	pa->ifbandwidth = if_pa->ifbandwidth;
 
 	if (qname_to_pfaltq(pa->qname, pa->ifname) != NULL) {
 		fprintf(stderr, "queue %s already exists on interface %s\n",
 		    pa->qname, pa->ifname);
 		return (1);
 	}
 	pa->qid = qname_to_qid(pa->qname);
 
 	parent = NULL;
 	if (pa->parent[0] != 0) {
 		parent = qname_to_pfaltq(pa->parent, pa->ifname);
 		if (parent == NULL) {
 			fprintf(stderr, "parent %s not found for %s\n",
 			    pa->parent, pa->qname);
 			return (1);
 		}
 		pa->parent_qid = parent->qid;
 	}
 	if (pa->qlimit == 0)
 		pa->qlimit = DEFAULT_QLIMIT;
 
 	if (pa->scheduler == ALTQT_CBQ || pa->scheduler == ALTQT_HFSC ||
 		pa->scheduler == ALTQT_FAIRQ) {
 		pa->bandwidth = eval_bwspec(bw,
 		    parent == NULL ? 0 : parent->bandwidth);
 
 		if (pa->bandwidth > pa->ifbandwidth) {
 			fprintf(stderr, "bandwidth for %s higher than "
 			    "interface\n", pa->qname);
 			return (1);
 		}
 		/* check the sum of the child bandwidth is under parent's */
 		if (parent != NULL) {
 			if (pa->bandwidth > parent->bandwidth) {
 				warnx("bandwidth for %s higher than parent",
 				    pa->qname);
 				return (1);
 			}
 			bwsum = 0;
 			TAILQ_FOREACH(altq, &altqs, entries) {
 				if (strncmp(altq->ifname, pa->ifname,
 				    IFNAMSIZ) == 0 &&
 				    altq->qname[0] != 0 &&
 				    strncmp(altq->parent, pa->parent,
 				    PF_QNAME_SIZE) == 0)
 					bwsum += altq->bandwidth;
 			}
 			bwsum += pa->bandwidth;
 			if (bwsum > parent->bandwidth) {
 				warnx("the sum of the child bandwidth higher"
 				    " than parent \"%s\"", parent->qname);
 			}
 		}
 	}
 
 	if (eval_queue_opts(pa, opts, parent == NULL? 0 : parent->bandwidth))
 		return (1);
 
 	switch (pa->scheduler) {
 	case ALTQT_CBQ:
 		error = eval_pfqueue_cbq(pf, pa);
 		break;
 	case ALTQT_PRIQ:
 		error = eval_pfqueue_priq(pf, pa);
 		break;
 	case ALTQT_HFSC:
 		error = eval_pfqueue_hfsc(pf, pa);
 		break;
 	case ALTQT_FAIRQ:
 		error = eval_pfqueue_fairq(pf, pa);
 		break;
 	default:
 		break;
 	}
 	return (error);
 }
 
 /*
  * CBQ support functions
  */
 #define	RM_FILTER_GAIN	5	/* log2 of gain, e.g., 5 => 31/32 */
 #define	RM_NS_PER_SEC	(1000000000)
 
 static int
 eval_pfqueue_cbq(struct pfctl *pf, struct pf_altq *pa)
 {
 	struct cbq_opts	*opts;
 	u_int		 ifmtu;
 
 	if (pa->priority >= CBQ_MAXPRI) {
 		warnx("priority out of range: max %d", CBQ_MAXPRI - 1);
 		return (-1);
 	}
 
 	ifmtu = getifmtu(pa->ifname);
 	opts = &pa->pq_u.cbq_opts;
 
 	if (opts->pktsize == 0) {	/* use default */
 		opts->pktsize = ifmtu;
 		if (opts->pktsize > MCLBYTES)	/* do what TCP does */
 			opts->pktsize &= ~MCLBYTES;
 	} else if (opts->pktsize > ifmtu)
 		opts->pktsize = ifmtu;
 	if (opts->maxpktsize == 0)	/* use default */
 		opts->maxpktsize = ifmtu;
 	else if (opts->maxpktsize > ifmtu)
 		opts->pktsize = ifmtu;
 
 	if (opts->pktsize > opts->maxpktsize)
 		opts->pktsize = opts->maxpktsize;
 
 	if (pa->parent[0] == 0)
 		opts->flags |= (CBQCLF_ROOTCLASS | CBQCLF_WRR);
 
 	cbq_compute_idletime(pf, pa);
 	return (0);
 }
 
 /*
  * compute ns_per_byte, maxidle, minidle, and offtime
  */
 static int
 cbq_compute_idletime(struct pfctl *pf, struct pf_altq *pa)
 {
 	struct cbq_opts	*opts;
 	double		 maxidle_s, maxidle, minidle;
 	double		 offtime, nsPerByte, ifnsPerByte, ptime, cptime;
 	double		 z, g, f, gton, gtom;
 	u_int		 minburst, maxburst;
 
 	opts = &pa->pq_u.cbq_opts;
 	ifnsPerByte = (1.0 / (double)pa->ifbandwidth) * RM_NS_PER_SEC * 8;
 	minburst = opts->minburst;
 	maxburst = opts->maxburst;
 
 	if (pa->bandwidth == 0)
 		f = 0.0001;	/* small enough? */
 	else
 		f = ((double) pa->bandwidth / (double) pa->ifbandwidth);
 
 	nsPerByte = ifnsPerByte / f;
 	ptime = (double)opts->pktsize * ifnsPerByte;
 	cptime = ptime * (1.0 - f) / f;
 
 	if (nsPerByte * (double)opts->maxpktsize > (double)INT_MAX) {
 		/*
 		 * this causes integer overflow in kernel!
 		 * (bandwidth < 6Kbps when max_pkt_size=1500)
 		 */
 		if (pa->bandwidth != 0 && (pf->opts & PF_OPT_QUIET) == 0)
 			warnx("queue bandwidth must be larger than %s",
 			    rate2str(ifnsPerByte * (double)opts->maxpktsize /
 			    (double)INT_MAX * (double)pa->ifbandwidth));
 			fprintf(stderr, "cbq: queue %s is too slow!\n",
 			    pa->qname);
 		nsPerByte = (double)(INT_MAX / opts->maxpktsize);
 	}
 
 	if (maxburst == 0) {  /* use default */
 		if (cptime > 10.0 * 1000000)
 			maxburst = 4;
 		else
 			maxburst = 16;
 	}
 	if (minburst == 0)  /* use default */
 		minburst = 2;
 	if (minburst > maxburst)
 		minburst = maxburst;
 
 	z = (double)(1 << RM_FILTER_GAIN);
 	g = (1.0 - 1.0 / z);
 	gton = pow(g, (double)maxburst);
 	gtom = pow(g, (double)(minburst-1));
 	maxidle = ((1.0 / f - 1.0) * ((1.0 - gton) / gton));
 	maxidle_s = (1.0 - g);
 	if (maxidle > maxidle_s)
 		maxidle = ptime * maxidle;
 	else
 		maxidle = ptime * maxidle_s;
 	offtime = cptime * (1.0 + 1.0/(1.0 - g) * (1.0 - gtom) / gtom);
 	minidle = -((double)opts->maxpktsize * (double)nsPerByte);
 
 	/* scale parameters */
 	maxidle = ((maxidle * 8.0) / nsPerByte) *
 	    pow(2.0, (double)RM_FILTER_GAIN);
 	offtime = (offtime * 8.0) / nsPerByte *
 	    pow(2.0, (double)RM_FILTER_GAIN);
 	minidle = ((minidle * 8.0) / nsPerByte) *
 	    pow(2.0, (double)RM_FILTER_GAIN);
 
 	maxidle = maxidle / 1000.0;
 	offtime = offtime / 1000.0;
 	minidle = minidle / 1000.0;
 
 	opts->minburst = minburst;
 	opts->maxburst = maxburst;
 	opts->ns_per_byte = (u_int)nsPerByte;
 	opts->maxidle = (u_int)fabs(maxidle);
 	opts->minidle = (int)minidle;
 	opts->offtime = (u_int)fabs(offtime);
 
 	return (0);
 }
 
 static int
 check_commit_cbq(int dev, int opts, struct pf_altq *pa)
 {
 	struct pf_altq	*altq;
 	int		 root_class, default_class;
 	int		 error = 0;
 
 	/*
 	 * check if cbq has one root queue and one default queue
 	 * for this interface
 	 */
 	root_class = default_class = 0;
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0)
 			continue;
 		if (altq->qname[0] == 0)  /* this is for interface */
 			continue;
 		if (altq->pq_u.cbq_opts.flags & CBQCLF_ROOTCLASS)
 			root_class++;
 		if (altq->pq_u.cbq_opts.flags & CBQCLF_DEFCLASS)
 			default_class++;
 	}
 	if (root_class != 1) {
 		warnx("should have one root queue on %s", pa->ifname);
 		error++;
 	}
 	if (default_class != 1) {
 		warnx("should have one default queue on %s", pa->ifname);
 		error++;
 	}
 	return (error);
 }
 
 static int
 print_cbq_opts(const struct pf_altq *a)
 {
 	const struct cbq_opts	*opts;
 
 	opts = &a->pq_u.cbq_opts;
 	if (opts->flags) {
 		printf("cbq(");
 		if (opts->flags & CBQCLF_RED)
 			printf(" red");
 		if (opts->flags & CBQCLF_ECN)
 			printf(" ecn");
 		if (opts->flags & CBQCLF_RIO)
 			printf(" rio");
 		if (opts->flags & CBQCLF_CODEL)
 			printf(" codel");
 		if (opts->flags & CBQCLF_CLEARDSCP)
 			printf(" cleardscp");
 		if (opts->flags & CBQCLF_FLOWVALVE)
 			printf(" flowvalve");
 		if (opts->flags & CBQCLF_BORROW)
 			printf(" borrow");
 		if (opts->flags & CBQCLF_WRR)
 			printf(" wrr");
 		if (opts->flags & CBQCLF_EFFICIENT)
 			printf(" efficient");
 		if (opts->flags & CBQCLF_ROOTCLASS)
 			printf(" root");
 		if (opts->flags & CBQCLF_DEFCLASS)
 			printf(" default");
 		printf(" ) ");
 
 		return (1);
 	} else
 		return (0);
 }
 
 /*
  * PRIQ support functions
  */
 static int
 eval_pfqueue_priq(struct pfctl *pf, struct pf_altq *pa)
 {
 	struct pf_altq	*altq;
 
 	if (pa->priority >= PRIQ_MAXPRI) {
 		warnx("priority out of range: max %d", PRIQ_MAXPRI - 1);
 		return (-1);
 	}
 	/* the priority should be unique for the interface */
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) == 0 &&
 		    altq->qname[0] != 0 && altq->priority == pa->priority) {
 			warnx("%s and %s have the same priority",
 			    altq->qname, pa->qname);
 			return (-1);
 		}
 	}
 
 	return (0);
 }
 
 static int
 check_commit_priq(int dev, int opts, struct pf_altq *pa)
 {
 	struct pf_altq	*altq;
 	int		 default_class;
 	int		 error = 0;
 
 	/*
 	 * check if priq has one default class for this interface
 	 */
 	default_class = 0;
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0)
 			continue;
 		if (altq->qname[0] == 0)  /* this is for interface */
 			continue;
 		if (altq->pq_u.priq_opts.flags & PRCF_DEFAULTCLASS)
 			default_class++;
 	}
 	if (default_class != 1) {
 		warnx("should have one default queue on %s", pa->ifname);
 		error++;
 	}
 	return (error);
 }
 
 static int
 print_priq_opts(const struct pf_altq *a)
 {
 	const struct priq_opts	*opts;
 
 	opts = &a->pq_u.priq_opts;
 
 	if (opts->flags) {
 		printf("priq(");
 		if (opts->flags & PRCF_RED)
 			printf(" red");
 		if (opts->flags & PRCF_ECN)
 			printf(" ecn");
 		if (opts->flags & PRCF_RIO)
 			printf(" rio");
 		if (opts->flags & PRCF_CODEL)
 			printf(" codel");
 		if (opts->flags & PRCF_CLEARDSCP)
 			printf(" cleardscp");
 		if (opts->flags & PRCF_DEFAULTCLASS)
 			printf(" default");
 		printf(" ) ");
 
 		return (1);
 	} else
 		return (0);
 }
 
 /*
  * HFSC support functions
  */
 static int
 eval_pfqueue_hfsc(struct pfctl *pf, struct pf_altq *pa)
 {
 	struct pf_altq		*altq, *parent;
 	struct hfsc_opts	*opts;
 	struct service_curve	 sc;
 
 	opts = &pa->pq_u.hfsc_opts;
 
 	if (pa->parent[0] == 0) {
 		/* root queue */
 		opts->lssc_m1 = pa->ifbandwidth;
 		opts->lssc_m2 = pa->ifbandwidth;
 		opts->lssc_d = 0;
 		return (0);
 	}
 
 	LIST_INIT(&rtsc);
 	LIST_INIT(&lssc);
 
 	/* if link_share is not specified, use bandwidth */
 	if (opts->lssc_m2 == 0)
 		opts->lssc_m2 = pa->bandwidth;
 
 	if ((opts->rtsc_m1 > 0 && opts->rtsc_m2 == 0) ||
 	    (opts->lssc_m1 > 0 && opts->lssc_m2 == 0) ||
 	    (opts->ulsc_m1 > 0 && opts->ulsc_m2 == 0)) {
 		warnx("m2 is zero for %s", pa->qname);
 		return (-1);
 	}
 
 	if ((opts->rtsc_m1 < opts->rtsc_m2 && opts->rtsc_m1 != 0) ||
 	    (opts->lssc_m1 < opts->lssc_m2 && opts->lssc_m1 != 0) ||
 	    (opts->ulsc_m1 < opts->ulsc_m2 && opts->ulsc_m1 != 0)) {
 		warnx("m1 must be zero for convex curve: %s", pa->qname);
 		return (-1);
 	}
 
 	/*
 	 * admission control:
 	 * for the real-time service curve, the sum of the service curves
 	 * should not exceed 80% of the interface bandwidth.  20% is reserved
 	 * not to over-commit the actual interface bandwidth.
 	 * for the linkshare service curve, the sum of the child service
 	 * curve should not exceed the parent service curve.
 	 * for the upper-limit service curve, the assigned bandwidth should
 	 * be smaller than the interface bandwidth, and the upper-limit should
 	 * be larger than the real-time service curve when both are defined.
 	 */
 	parent = qname_to_pfaltq(pa->parent, pa->ifname);
 	if (parent == NULL)
 		errx(1, "parent %s not found for %s", pa->parent, pa->qname);
 
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0)
 			continue;
 		if (altq->qname[0] == 0)  /* this is for interface */
 			continue;
 
 		/* if the class has a real-time service curve, add it. */
 		if (opts->rtsc_m2 != 0 && altq->pq_u.hfsc_opts.rtsc_m2 != 0) {
 			sc.m1 = altq->pq_u.hfsc_opts.rtsc_m1;
 			sc.d = altq->pq_u.hfsc_opts.rtsc_d;
 			sc.m2 = altq->pq_u.hfsc_opts.rtsc_m2;
 			gsc_add_sc(&rtsc, &sc);
 		}
 
 		if (strncmp(altq->parent, pa->parent, PF_QNAME_SIZE) != 0)
 			continue;
 
 		/* if the class has a linkshare service curve, add it. */
 		if (opts->lssc_m2 != 0 && altq->pq_u.hfsc_opts.lssc_m2 != 0) {
 			sc.m1 = altq->pq_u.hfsc_opts.lssc_m1;
 			sc.d = altq->pq_u.hfsc_opts.lssc_d;
 			sc.m2 = altq->pq_u.hfsc_opts.lssc_m2;
 			gsc_add_sc(&lssc, &sc);
 		}
 	}
 
 	/* check the real-time service curve.  reserve 20% of interface bw */
 	if (opts->rtsc_m2 != 0) {
 		/* add this queue to the sum */
 		sc.m1 = opts->rtsc_m1;
 		sc.d = opts->rtsc_d;
 		sc.m2 = opts->rtsc_m2;
 		gsc_add_sc(&rtsc, &sc);
 		/* compare the sum with 80% of the interface */
 		sc.m1 = 0;
 		sc.d = 0;
 		sc.m2 = pa->ifbandwidth / 100 * 80;
 		if (!is_gsc_under_sc(&rtsc, &sc)) {
 			warnx("real-time sc exceeds 80%% of the interface "
 			    "bandwidth (%s)", rate2str((double)sc.m2));
 			goto err_ret;
 		}
 	}
 
 	/* check the linkshare service curve. */
 	if (opts->lssc_m2 != 0) {
 		/* add this queue to the child sum */
 		sc.m1 = opts->lssc_m1;
 		sc.d = opts->lssc_d;
 		sc.m2 = opts->lssc_m2;
 		gsc_add_sc(&lssc, &sc);
 		/* compare the sum of the children with parent's sc */
 		sc.m1 = parent->pq_u.hfsc_opts.lssc_m1;
 		sc.d = parent->pq_u.hfsc_opts.lssc_d;
 		sc.m2 = parent->pq_u.hfsc_opts.lssc_m2;
 		if (!is_gsc_under_sc(&lssc, &sc)) {
 			warnx("linkshare sc exceeds parent's sc");
 			goto err_ret;
 		}
 	}
 
 	/* check the upper-limit service curve. */
 	if (opts->ulsc_m2 != 0) {
 		if (opts->ulsc_m1 > pa->ifbandwidth ||
 		    opts->ulsc_m2 > pa->ifbandwidth) {
 			warnx("upper-limit larger than interface bandwidth");
 			goto err_ret;
 		}
 		if (opts->rtsc_m2 != 0 && opts->rtsc_m2 > opts->ulsc_m2) {
 			warnx("upper-limit sc smaller than real-time sc");
 			goto err_ret;
 		}
 	}
 
 	gsc_destroy(&rtsc);
 	gsc_destroy(&lssc);
 
 	return (0);
 
 err_ret:
 	gsc_destroy(&rtsc);
 	gsc_destroy(&lssc);
 	return (-1);
 }
 
 /*
  * FAIRQ support functions
  */
 static int
 eval_pfqueue_fairq(struct pfctl *pf __unused, struct pf_altq *pa)
 {
 	struct pf_altq		*altq, *parent;
 	struct fairq_opts	*opts;
 	struct service_curve	 sc;
 
 	opts = &pa->pq_u.fairq_opts;
 
 	if (pa->parent[0] == 0) {
 		/* root queue */
 		opts->lssc_m1 = pa->ifbandwidth;
 		opts->lssc_m2 = pa->ifbandwidth;
 		opts->lssc_d = 0;
 		return (0);
 	}
 
 	LIST_INIT(&lssc);
 
 	/* if link_share is not specified, use bandwidth */
 	if (opts->lssc_m2 == 0)
 		opts->lssc_m2 = pa->bandwidth;
 
 	/*
 	 * admission control:
 	 * for the real-time service curve, the sum of the service curves
 	 * should not exceed 80% of the interface bandwidth.  20% is reserved
 	 * not to over-commit the actual interface bandwidth.
 	 * for the link-sharing service curve, the sum of the child service
 	 * curve should not exceed the parent service curve.
 	 * for the upper-limit service curve, the assigned bandwidth should
 	 * be smaller than the interface bandwidth, and the upper-limit should
 	 * be larger than the real-time service curve when both are defined.
 	 */
 	parent = qname_to_pfaltq(pa->parent, pa->ifname);
 	if (parent == NULL)
 		errx(1, "parent %s not found for %s", pa->parent, pa->qname);
 
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0)
 			continue;
 		if (altq->qname[0] == 0)  /* this is for interface */
 			continue;
 
 		if (strncmp(altq->parent, pa->parent, PF_QNAME_SIZE) != 0)
 			continue;
 
 		/* if the class has a link-sharing service curve, add it. */
 		if (opts->lssc_m2 != 0 && altq->pq_u.fairq_opts.lssc_m2 != 0) {
 			sc.m1 = altq->pq_u.fairq_opts.lssc_m1;
 			sc.d = altq->pq_u.fairq_opts.lssc_d;
 			sc.m2 = altq->pq_u.fairq_opts.lssc_m2;
 			gsc_add_sc(&lssc, &sc);
 		}
 	}
 
 	/* check the link-sharing service curve. */
 	if (opts->lssc_m2 != 0) {
 		sc.m1 = parent->pq_u.fairq_opts.lssc_m1;
 		sc.d = parent->pq_u.fairq_opts.lssc_d;
 		sc.m2 = parent->pq_u.fairq_opts.lssc_m2;
 		if (!is_gsc_under_sc(&lssc, &sc)) {
 			warnx("link-sharing sc exceeds parent's sc");
 			goto err_ret;
 		}
 	}
 
 	gsc_destroy(&lssc);
 
 	return (0);
 
 err_ret:
 	gsc_destroy(&lssc);
 	return (-1);
 }
 
 static int
 check_commit_hfsc(int dev, int opts, struct pf_altq *pa)
 {
 	struct pf_altq	*altq, *def = NULL;
 	int		 default_class;
 	int		 error = 0;
 
 	/* check if hfsc has one default queue for this interface */
 	default_class = 0;
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0)
 			continue;
 		if (altq->qname[0] == 0)  /* this is for interface */
 			continue;
 		if (altq->parent[0] == 0)  /* dummy root */
 			continue;
 		if (altq->pq_u.hfsc_opts.flags & HFCF_DEFAULTCLASS) {
 			default_class++;
 			def = altq;
 		}
 	}
 	if (default_class != 1) {
 		warnx("should have one default queue on %s", pa->ifname);
 		return (1);
 	}
 	/* make sure the default queue is a leaf */
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0)
 			continue;
 		if (altq->qname[0] == 0)  /* this is for interface */
 			continue;
 		if (strncmp(altq->parent, def->qname, PF_QNAME_SIZE) == 0) {
 			warnx("default queue is not a leaf");
 			error++;
 		}
 	}
 	return (error);
 }
 
 static int
 check_commit_fairq(int dev __unused, int opts __unused, struct pf_altq *pa)
 {
 	struct pf_altq	*altq, *def = NULL;
 	int		 default_class;
 	int		 error = 0;
 
 	/* check if fairq has one default queue for this interface */
 	default_class = 0;
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0)
 			continue;
 		if (altq->qname[0] == 0)  /* this is for interface */
 			continue;
 		if (altq->pq_u.fairq_opts.flags & FARF_DEFAULTCLASS) {
 			default_class++;
 			def = altq;
 		}
 	}
 	if (default_class != 1) {
 		warnx("should have one default queue on %s", pa->ifname);
 		return (1);
 	}
 	/* make sure the default queue is a leaf */
 	TAILQ_FOREACH(altq, &altqs, entries) {
 		if (strncmp(altq->ifname, pa->ifname, IFNAMSIZ) != 0)
 			continue;
 		if (altq->qname[0] == 0)  /* this is for interface */
 			continue;
 		if (strncmp(altq->parent, def->qname, PF_QNAME_SIZE) == 0) {
 			warnx("default queue is not a leaf");
 			error++;
 		}
 	}
 	return (error);
 }
 
 static int
 print_hfsc_opts(const struct pf_altq *a, const struct node_queue_opt *qopts)
 {
 	const struct hfsc_opts		*opts;
 	const struct node_hfsc_sc	*rtsc, *lssc, *ulsc;
 
 	opts = &a->pq_u.hfsc_opts;
 	if (qopts == NULL)
 		rtsc = lssc = ulsc = NULL;
 	else {
 		rtsc = &qopts->data.hfsc_opts.realtime;
 		lssc = &qopts->data.hfsc_opts.linkshare;
 		ulsc = &qopts->data.hfsc_opts.upperlimit;
 	}
 
 	if (opts->flags || opts->rtsc_m2 != 0 || opts->ulsc_m2 != 0 ||
 	    (opts->lssc_m2 != 0 && (opts->lssc_m2 != a->bandwidth ||
 	    opts->lssc_d != 0))) {
 		printf("hfsc(");
 		if (opts->flags & HFCF_RED)
 			printf(" red");
 		if (opts->flags & HFCF_ECN)
 			printf(" ecn");
 		if (opts->flags & HFCF_RIO)
 			printf(" rio");
 		if (opts->flags & HFCF_CODEL)
 			printf(" codel");
 		if (opts->flags & HFCF_CLEARDSCP)
 			printf(" cleardscp");
 		if (opts->flags & HFCF_DEFAULTCLASS)
 			printf(" default");
 		if (opts->rtsc_m2 != 0)
 			print_hfsc_sc("realtime", opts->rtsc_m1, opts->rtsc_d,
 			    opts->rtsc_m2, rtsc);
 		if (opts->lssc_m2 != 0 && (opts->lssc_m2 != a->bandwidth ||
 		    opts->lssc_d != 0))
 			print_hfsc_sc("linkshare", opts->lssc_m1, opts->lssc_d,
 			    opts->lssc_m2, lssc);
 		if (opts->ulsc_m2 != 0)
 			print_hfsc_sc("upperlimit", opts->ulsc_m1, opts->ulsc_d,
 			    opts->ulsc_m2, ulsc);
 		printf(" ) ");
 
 		return (1);
 	} else
 		return (0);
 }
 
 static int
 print_codel_opts(const struct pf_altq *a, const struct node_queue_opt *qopts)
 {
 	const struct codel_opts *opts;
 
 	opts = &a->pq_u.codel_opts;
 	if (opts->target || opts->interval || opts->ecn) {
 		printf("codel(");
 		if (opts->target)
 			printf(" target %d", opts->target);
 		if (opts->interval)
 			printf(" interval %d", opts->interval);
 		if (opts->ecn)
 			printf("ecn");
 		printf(" ) ");
 
 		return (1);
 	}
 
 	return (0);
 }
 
 static int
 print_fairq_opts(const struct pf_altq *a, const struct node_queue_opt *qopts)
 {
 	const struct fairq_opts		*opts;
 	const struct node_fairq_sc	*loc_lssc;
 
 	opts = &a->pq_u.fairq_opts;
 	if (qopts == NULL)
 		loc_lssc = NULL;
 	else
 		loc_lssc = &qopts->data.fairq_opts.linkshare;
 
 	if (opts->flags ||
 	    (opts->lssc_m2 != 0 && (opts->lssc_m2 != a->bandwidth ||
 	    opts->lssc_d != 0))) {
 		printf("fairq(");
 		if (opts->flags & FARF_RED)
 			printf(" red");
 		if (opts->flags & FARF_ECN)
 			printf(" ecn");
 		if (opts->flags & FARF_RIO)
 			printf(" rio");
 		if (opts->flags & FARF_CODEL)
 			printf(" codel");
 		if (opts->flags & FARF_CLEARDSCP)
 			printf(" cleardscp");
 		if (opts->flags & FARF_DEFAULTCLASS)
 			printf(" default");
 		if (opts->lssc_m2 != 0 && (opts->lssc_m2 != a->bandwidth ||
 		    opts->lssc_d != 0))
 			print_fairq_sc("linkshare", opts->lssc_m1, opts->lssc_d,
 			    opts->lssc_m2, loc_lssc);
 		printf(" ) ");
 
 		return (1);
 	} else
 		return (0);
 }
 
 /*
  * admission control using generalized service curve
  */
 
 /* add a new service curve to a generalized service curve */
 static void
 gsc_add_sc(struct gen_sc *gsc, struct service_curve *sc)
 {
 	if (is_sc_null(sc))
 		return;
 	if (sc->d != 0)
 		gsc_add_seg(gsc, 0.0, 0.0, (double)sc->d, (double)sc->m1);
 	gsc_add_seg(gsc, (double)sc->d, 0.0, INFINITY, (double)sc->m2);
 }
 
 /*
  * check whether all points of a generalized service curve have
  * their y-coordinates no larger than a given two-piece linear
  * service curve.
  */
 static int
 is_gsc_under_sc(struct gen_sc *gsc, struct service_curve *sc)
 {
 	struct segment	*s, *last, *end;
 	double		 y;
 
 	if (is_sc_null(sc)) {
 		if (LIST_EMPTY(gsc))
 			return (1);
 		LIST_FOREACH(s, gsc, _next) {
 			if (s->m != 0)
 				return (0);
 		}
 		return (1);
 	}
 	/*
 	 * gsc has a dummy entry at the end with x = INFINITY.
 	 * loop through up to this dummy entry.
 	 */
 	end = gsc_getentry(gsc, INFINITY);
 	if (end == NULL)
 		return (1);
 	last = NULL;
 	for (s = LIST_FIRST(gsc); s != end; s = LIST_NEXT(s, _next)) {
 		if (s->y > sc_x2y(sc, s->x))
 			return (0);
 		last = s;
 	}
 	/* last now holds the real last segment */
 	if (last == NULL)
 		return (1);
 	if (last->m > sc->m2)
 		return (0);
 	if (last->x < sc->d && last->m > sc->m1) {
 		y = last->y + (sc->d - last->x) * last->m;
 		if (y > sc_x2y(sc, sc->d))
 			return (0);
 	}
 	return (1);
 }
 
 static void
 gsc_destroy(struct gen_sc *gsc)
 {
 	struct segment	*s;
 
 	while ((s = LIST_FIRST(gsc)) != NULL) {
 		LIST_REMOVE(s, _next);
 		free(s);
 	}
 }
 
 /*
  * return a segment entry starting at x.
  * if gsc has no entry starting at x, a new entry is created at x.
  */
 static struct segment *
 gsc_getentry(struct gen_sc *gsc, double x)
 {
 	struct segment	*new, *prev, *s;
 
 	prev = NULL;
 	LIST_FOREACH(s, gsc, _next) {
 		if (s->x == x)
 			return (s);	/* matching entry found */
 		else if (s->x < x)
 			prev = s;
 		else
 			break;
 	}
 
 	/* we have to create a new entry */
 	if ((new = calloc(1, sizeof(struct segment))) == NULL)
 		return (NULL);
 
 	new->x = x;
 	if (x == INFINITY || s == NULL)
 		new->d = 0;
 	else if (s->x == INFINITY)
 		new->d = INFINITY;
 	else
 		new->d = s->x - x;
 	if (prev == NULL) {
 		/* insert the new entry at the head of the list */
 		new->y = 0;
 		new->m = 0;
 		LIST_INSERT_HEAD(gsc, new, _next);
 	} else {
 		/*
 		 * the start point intersects with the segment pointed by
 		 * prev.  divide prev into 2 segments
 		 */
 		if (x == INFINITY) {
 			prev->d = INFINITY;
 			if (prev->m == 0)
 				new->y = prev->y;
 			else
 				new->y = INFINITY;
 		} else {
 			prev->d = x - prev->x;
 			new->y = prev->d * prev->m + prev->y;
 		}
 		new->m = prev->m;
 		LIST_INSERT_AFTER(prev, new, _next);
 	}
 	return (new);
 }
 
 /* add a segment to a generalized service curve */
 static int
 gsc_add_seg(struct gen_sc *gsc, double x, double y, double d, double m)
 {
 	struct segment	*start, *end, *s;
 	double		 x2;
 
 	if (d == INFINITY)
 		x2 = INFINITY;
 	else
 		x2 = x + d;
 	start = gsc_getentry(gsc, x);
 	end = gsc_getentry(gsc, x2);
 	if (start == NULL || end == NULL)
 		return (-1);
 
 	for (s = start; s != end; s = LIST_NEXT(s, _next)) {
 		s->m += m;
 		s->y += y + (s->x - x) * m;
 	}
 
 	end = gsc_getentry(gsc, INFINITY);
 	for (; s != end; s = LIST_NEXT(s, _next)) {
 		s->y += m * d;
 	}
 
 	return (0);
 }
 
 /* get y-projection of a service curve */
 static double
 sc_x2y(struct service_curve *sc, double x)
 {
 	double	y;
 
 	if (x <= (double)sc->d)
 		/* y belongs to the 1st segment */
 		y = x * (double)sc->m1;
 	else
 		/* y belongs to the 2nd segment */
 		y = (double)sc->d * (double)sc->m1
 			+ (x - (double)sc->d) * (double)sc->m2;
 	return (y);
 }
 
 /*
  * misc utilities
  */
 #define	R2S_BUFS	8
 #define	RATESTR_MAX	16
 
 char *
 rate2str(double rate)
 {
 	char		*buf;
 	static char	 r2sbuf[R2S_BUFS][RATESTR_MAX];  /* ring bufer */
 	static int	 idx = 0;
 	int		 i;
 	static const char unit[] = " KMG";
 
 	buf = r2sbuf[idx++];
 	if (idx == R2S_BUFS)
 		idx = 0;
 
 	for (i = 0; rate >= 1000 && i <= 3; i++)
 		rate /= 1000;
 
 	if ((int)(rate * 100) % 100)
 		snprintf(buf, RATESTR_MAX, "%.2f%cb", rate, unit[i]);
 	else
 		snprintf(buf, RATESTR_MAX, "%d%cb", (int)rate, unit[i]);
 
 	return (buf);
 }
 
 #ifdef __FreeBSD__
 /*
  * XXX
  * FreeBSD does not have SIOCGIFDATA.
  * To emulate this, DIOCGIFSPEED ioctl added to pf.
  */
 u_int32_t
 getifspeed(int pfdev, char *ifname)
 {
 	struct pf_ifspeed io;
 
 	bzero(&io, sizeof io);
 	if (strlcpy(io.ifname, ifname, IFNAMSIZ) >=
 	    sizeof(io.ifname)) 
 		errx(1, "getifspeed: strlcpy");
 	if (ioctl(pfdev, DIOCGIFSPEED, &io) == -1)
 		err(1, "DIOCGIFSPEED");
 	return ((u_int32_t)io.baudrate);
 }
 #else
 u_int32_t
 getifspeed(char *ifname)
 {
 	int		s;
 	struct ifreq	ifr;
 	struct if_data	ifrdat;
 
 	if ((s = socket(get_socket_domain(), SOCK_DGRAM, 0)) < 0)
 		err(1, "socket");
 	bzero(&ifr, sizeof(ifr));
 	if (strlcpy(ifr.ifr_name, ifname, sizeof(ifr.ifr_name)) >=
 	    sizeof(ifr.ifr_name))
 		errx(1, "getifspeed: strlcpy");
 	ifr.ifr_data = (caddr_t)&ifrdat;
 	if (ioctl(s, SIOCGIFDATA, (caddr_t)&ifr) == -1)
 		err(1, "SIOCGIFDATA");
 	if (close(s))
 		err(1, "close");
 	return ((u_int32_t)ifrdat.ifi_baudrate);
 }
 #endif
 
 u_long
 getifmtu(char *ifname)
 {
 	int		s;
 	struct ifreq	ifr;
 
 	if ((s = socket(get_socket_domain(), SOCK_DGRAM, 0)) < 0)
 		err(1, "socket");
 	bzero(&ifr, sizeof(ifr));
 	if (strlcpy(ifr.ifr_name, ifname, sizeof(ifr.ifr_name)) >=
 	    sizeof(ifr.ifr_name))
 		errx(1, "getifmtu: strlcpy");
 	if (ioctl(s, SIOCGIFMTU, (caddr_t)&ifr) == -1)
 #ifdef __FreeBSD__
 		ifr.ifr_mtu = 1500;
 #else
 		err(1, "SIOCGIFMTU");
 #endif
 	if (close(s))
 		err(1, "close");
 	if (ifr.ifr_mtu > 0)
 		return (ifr.ifr_mtu);
 	else {
 		warnx("could not get mtu for %s, assuming 1500", ifname);
 		return (1500);
 	}
 }
 
 int
 eval_queue_opts(struct pf_altq *pa, struct node_queue_opt *opts,
     u_int32_t ref_bw)
 {
 	int	errors = 0;
 
 	switch (pa->scheduler) {
 	case ALTQT_CBQ:
 		pa->pq_u.cbq_opts = opts->data.cbq_opts;
 		break;
 	case ALTQT_PRIQ:
 		pa->pq_u.priq_opts = opts->data.priq_opts;
 		break;
 	case ALTQT_HFSC:
 		pa->pq_u.hfsc_opts.flags = opts->data.hfsc_opts.flags;
 		if (opts->data.hfsc_opts.linkshare.used) {
 			pa->pq_u.hfsc_opts.lssc_m1 =
 			    eval_bwspec(&opts->data.hfsc_opts.linkshare.m1,
 			    ref_bw);
 			pa->pq_u.hfsc_opts.lssc_m2 =
 			    eval_bwspec(&opts->data.hfsc_opts.linkshare.m2,
 			    ref_bw);
 			pa->pq_u.hfsc_opts.lssc_d =
 			    opts->data.hfsc_opts.linkshare.d;
 		}
 		if (opts->data.hfsc_opts.realtime.used) {
 			pa->pq_u.hfsc_opts.rtsc_m1 =
 			    eval_bwspec(&opts->data.hfsc_opts.realtime.m1,
 			    ref_bw);
 			pa->pq_u.hfsc_opts.rtsc_m2 =
 			    eval_bwspec(&opts->data.hfsc_opts.realtime.m2,
 			    ref_bw);
 			pa->pq_u.hfsc_opts.rtsc_d =
 			    opts->data.hfsc_opts.realtime.d;
 		}
 		if (opts->data.hfsc_opts.upperlimit.used) {
 			pa->pq_u.hfsc_opts.ulsc_m1 =
 			    eval_bwspec(&opts->data.hfsc_opts.upperlimit.m1,
 			    ref_bw);
 			pa->pq_u.hfsc_opts.ulsc_m2 =
 			    eval_bwspec(&opts->data.hfsc_opts.upperlimit.m2,
 			    ref_bw);
 			pa->pq_u.hfsc_opts.ulsc_d =
 			    opts->data.hfsc_opts.upperlimit.d;
 		}
 		break;
 	case ALTQT_FAIRQ:
 		pa->pq_u.fairq_opts.flags = opts->data.fairq_opts.flags;
 		pa->pq_u.fairq_opts.nbuckets = opts->data.fairq_opts.nbuckets;
 		pa->pq_u.fairq_opts.hogs_m1 =
 			eval_bwspec(&opts->data.fairq_opts.hogs_bw, ref_bw);
 
 		if (opts->data.fairq_opts.linkshare.used) {
 			pa->pq_u.fairq_opts.lssc_m1 =
 			    eval_bwspec(&opts->data.fairq_opts.linkshare.m1,
 			    ref_bw);
 			pa->pq_u.fairq_opts.lssc_m2 =
 			    eval_bwspec(&opts->data.fairq_opts.linkshare.m2,
 			    ref_bw);
 			pa->pq_u.fairq_opts.lssc_d =
 			    opts->data.fairq_opts.linkshare.d;
 		}
 		break;
 	case ALTQT_CODEL:
 		pa->pq_u.codel_opts.target = opts->data.codel_opts.target;
 		pa->pq_u.codel_opts.interval = opts->data.codel_opts.interval;
 		pa->pq_u.codel_opts.ecn = opts->data.codel_opts.ecn;
 		break;
 	default:
 		warnx("eval_queue_opts: unknown scheduler type %u",
 		    opts->qtype);
 		errors++;
 		break;
 	}
 
 	return (errors);
 }
 
 u_int32_t
 eval_bwspec(struct node_queue_bw *bw, u_int32_t ref_bw)
 {
 	if (bw->bw_absolute > 0)
 		return (bw->bw_absolute);
 
 	if (bw->bw_percent > 0)
 		return (ref_bw / 100 * bw->bw_percent);
 
 	return (0);
 }
 
 void
 print_hfsc_sc(const char *scname, u_int m1, u_int d, u_int m2,
     const struct node_hfsc_sc *sc)
 {
 	printf(" %s", scname);
 
 	if (d != 0) {
 		printf("(");
 		if (sc != NULL && sc->m1.bw_percent > 0)
 			printf("%u%%", sc->m1.bw_percent);
 		else
 			printf("%s", rate2str((double)m1));
 		printf(" %u", d);
 	}
 
 	if (sc != NULL && sc->m2.bw_percent > 0)
 		printf(" %u%%", sc->m2.bw_percent);
 	else
 		printf(" %s", rate2str((double)m2));
 
 	if (d != 0)
 		printf(")");
 }
 
 void
 print_fairq_sc(const char *scname, u_int m1, u_int d, u_int m2,
     const struct node_fairq_sc *sc)
 {
 	printf(" %s", scname);
 
 	if (d != 0) {
 		printf("(");
 		if (sc != NULL && sc->m1.bw_percent > 0)
 			printf("%u%%", sc->m1.bw_percent);
 		else
 			printf("%s", rate2str((double)m1));
 		printf(" %u", d);
 	}
 
 	if (sc != NULL && sc->m2.bw_percent > 0)
 		printf(" %u%%", sc->m2.bw_percent);
 	else
 		printf(" %s", rate2str((double)m2));
 
 	if (d != 0)
 		printf(")");
 }
Index: user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_optimize.c
===================================================================
--- user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_optimize.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_optimize.c	(revision 303775)
@@ -1,1655 +1,1656 @@
 /*	$OpenBSD: pfctl_optimize.c,v 1.17 2008/05/06 03:45:21 mpf Exp $ */
 
 /*
  * Copyright (c) 2004 Mike Frantzen <frantzen@openbsd.org>
  *
  * Permission to use, copy, modify, and distribute this software for any
  * purpose with or without fee is hereby granted, provided that the above
  * copyright notice and this permission notice appear in all copies.
  *
  * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
  * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
  * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
  * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
  * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/types.h>
 #include <sys/ioctl.h>
 #include <sys/socket.h>
 
 #include <net/if.h>
 #include <net/pfvar.h>
 
 #include <netinet/in.h>
 #include <arpa/inet.h>
 
 #include <assert.h>
 #include <ctype.h>
 #include <err.h>
 #include <errno.h>
 #include <stddef.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 
 #include "pfctl_parser.h"
 #include "pfctl.h"
 
 /* The size at which a table becomes faster than individual rules */
 #define TABLE_THRESHOLD		6
 
 
 /* #define OPT_DEBUG	1 */
 #ifdef OPT_DEBUG
 # define DEBUG(str, v...) \
 	printf("%s: " str "\n", __FUNCTION__ , ## v)
 #else
 # define DEBUG(str, v...) ((void)0)
 #endif
 
 
 /*
  * A container that lets us sort a superblock to optimize the skip step jumps
  */
 struct pf_skip_step {
 	int				ps_count;	/* number of items */
 	TAILQ_HEAD( , pf_opt_rule)	ps_rules;
 	TAILQ_ENTRY(pf_skip_step)	ps_entry;
 };
 
 
 /*
  * A superblock is a block of adjacent rules of similar action.  If there
  * are five PASS rules in a row, they all become members of a superblock.
  * Once we have a superblock, we are free to re-order any rules within it
  * in order to improve performance; if a packet is passed, it doesn't matter
  * who passed it.
  */
 struct superblock {
 	TAILQ_HEAD( , pf_opt_rule)		 sb_rules;
 	TAILQ_ENTRY(superblock)			 sb_entry;
 	struct superblock			*sb_profiled_block;
 	TAILQ_HEAD(skiplist, pf_skip_step)	 sb_skipsteps[PF_SKIP_COUNT];
 };
 TAILQ_HEAD(superblocks, superblock);
 
 
 /*
  * Description of the PF rule structure.
  */
 enum {
     BARRIER,	/* the presence of the field puts the rule in it's own block */
     BREAK,	/* the field may not differ between rules in a superblock */
     NOMERGE,	/* the field may not differ between rules when combined */
     COMBINED,	/* the field may itself be combined with other rules */
     DC,		/* we just don't care about the field */
     NEVER};	/* we should never see this field set?!? */
-struct pf_rule_field {
+static struct pf_rule_field {
 	const char	*prf_name;
 	int		 prf_type;
 	size_t		 prf_offset;
 	size_t		 prf_size;
 } pf_rule_desc[] = {
 #define PF_RULE_FIELD(field, ty)	\
     {#field,				\
     ty,					\
     offsetof(struct pf_rule, field),	\
     sizeof(((struct pf_rule *)0)->field)}
 
 
     /*
      * The presence of these fields in a rule put the rule in it's own
      * superblock.  Thus it will not be optimized.  It also prevents the
      * rule from being re-ordered at all.
      */
     PF_RULE_FIELD(label,		BARRIER),
     PF_RULE_FIELD(prob,			BARRIER),
     PF_RULE_FIELD(max_states,		BARRIER),
     PF_RULE_FIELD(max_src_nodes,	BARRIER),
     PF_RULE_FIELD(max_src_states,	BARRIER),
     PF_RULE_FIELD(max_src_conn,		BARRIER),
     PF_RULE_FIELD(max_src_conn_rate,	BARRIER),
     PF_RULE_FIELD(anchor,		BARRIER),	/* for now */
 
     /*
      * These fields must be the same between all rules in the same superblock.
      * These rules are allowed to be re-ordered but only among like rules.
      * For instance we can re-order all 'tag "foo"' rules because they have the
      * same tag.  But we can not re-order between a 'tag "foo"' and a
      * 'tag "bar"' since that would change the meaning of the ruleset.
      */
     PF_RULE_FIELD(tagname,		BREAK),
     PF_RULE_FIELD(keep_state,		BREAK),
     PF_RULE_FIELD(qname,		BREAK),
     PF_RULE_FIELD(pqname,		BREAK),
     PF_RULE_FIELD(rt,			BREAK),
     PF_RULE_FIELD(allow_opts,		BREAK),
     PF_RULE_FIELD(rule_flag,		BREAK),
     PF_RULE_FIELD(action,		BREAK),
     PF_RULE_FIELD(log,			BREAK),
     PF_RULE_FIELD(quick,		BREAK),
     PF_RULE_FIELD(return_ttl,		BREAK),
     PF_RULE_FIELD(overload_tblname,	BREAK),
     PF_RULE_FIELD(flush,		BREAK),
     PF_RULE_FIELD(rpool,		BREAK),
     PF_RULE_FIELD(logif,		BREAK),
 
     /*
      * Any fields not listed in this structure act as BREAK fields
      */
 
 
     /*
      * These fields must not differ when we merge two rules together but
      * their difference isn't enough to put the rules in different superblocks.
      * There are no problems re-ordering any rules with these fields.
      */
     PF_RULE_FIELD(af,			NOMERGE),
     PF_RULE_FIELD(ifnot,		NOMERGE),
     PF_RULE_FIELD(ifname,		NOMERGE),	/* hack for IF groups */
     PF_RULE_FIELD(match_tag_not,	NOMERGE),
     PF_RULE_FIELD(match_tagname,	NOMERGE),
     PF_RULE_FIELD(os_fingerprint,	NOMERGE),
     PF_RULE_FIELD(timeout,		NOMERGE),
     PF_RULE_FIELD(return_icmp,		NOMERGE),
     PF_RULE_FIELD(return_icmp6,		NOMERGE),
     PF_RULE_FIELD(uid,			NOMERGE),
     PF_RULE_FIELD(gid,			NOMERGE),
     PF_RULE_FIELD(direction,		NOMERGE),
     PF_RULE_FIELD(proto,		NOMERGE),
     PF_RULE_FIELD(type,			NOMERGE),
     PF_RULE_FIELD(code,			NOMERGE),
     PF_RULE_FIELD(flags,		NOMERGE),
     PF_RULE_FIELD(flagset,		NOMERGE),
     PF_RULE_FIELD(tos,			NOMERGE),
     PF_RULE_FIELD(src.port,		NOMERGE),
     PF_RULE_FIELD(dst.port,		NOMERGE),
     PF_RULE_FIELD(src.port_op,		NOMERGE),
     PF_RULE_FIELD(dst.port_op,		NOMERGE),
     PF_RULE_FIELD(src.neg,		NOMERGE),
     PF_RULE_FIELD(dst.neg,		NOMERGE),
 
     /* These fields can be merged */
     PF_RULE_FIELD(src.addr,		COMBINED),
     PF_RULE_FIELD(dst.addr,		COMBINED),
 
     /* We just don't care about these fields.  They're set by the kernel */
     PF_RULE_FIELD(skip,			DC),
     PF_RULE_FIELD(evaluations,		DC),
     PF_RULE_FIELD(packets,		DC),
     PF_RULE_FIELD(bytes,		DC),
     PF_RULE_FIELD(kif,			DC),
     PF_RULE_FIELD(states_cur,		DC),
     PF_RULE_FIELD(states_tot,		DC),
     PF_RULE_FIELD(src_nodes,		DC),
     PF_RULE_FIELD(nr,			DC),
     PF_RULE_FIELD(entries,		DC),
     PF_RULE_FIELD(qid,			DC),
     PF_RULE_FIELD(pqid,			DC),
     PF_RULE_FIELD(anchor_relative,	DC),
     PF_RULE_FIELD(anchor_wildcard,	DC),
     PF_RULE_FIELD(tag,			DC),
     PF_RULE_FIELD(match_tag,		DC),
     PF_RULE_FIELD(overload_tbl,		DC),
 
     /* These fields should never be set in a PASS/BLOCK rule */
     PF_RULE_FIELD(natpass,		NEVER),
     PF_RULE_FIELD(max_mss,		NEVER),
     PF_RULE_FIELD(min_ttl,		NEVER),
     PF_RULE_FIELD(set_tos,		NEVER),
 };
 
 
 
 int	add_opt_table(struct pfctl *, struct pf_opt_tbl **, sa_family_t,
 	    struct pf_rule_addr *);
 int	addrs_combineable(struct pf_rule_addr *, struct pf_rule_addr *);
 int	addrs_equal(struct pf_rule_addr *, struct pf_rule_addr *);
 int	block_feedback(struct pfctl *, struct superblock *);
 int	combine_rules(struct pfctl *, struct superblock *);
 void	comparable_rule(struct pf_rule *, const struct pf_rule *, int);
 int	construct_superblocks(struct pfctl *, struct pf_opt_queue *,
 	    struct superblocks *);
 void	exclude_supersets(struct pf_rule *, struct pf_rule *);
 int	interface_group(const char *);
 int	load_feedback_profile(struct pfctl *, struct superblocks *);
 int	optimize_superblock(struct pfctl *, struct superblock *);
 int	pf_opt_create_table(struct pfctl *, struct pf_opt_tbl *);
 void	remove_from_skipsteps(struct skiplist *, struct superblock *,
 	    struct pf_opt_rule *, struct pf_skip_step *);
 int	remove_identical_rules(struct pfctl *, struct superblock *);
 int	reorder_rules(struct pfctl *, struct superblock *, int);
 int	rules_combineable(struct pf_rule *, struct pf_rule *);
 void	skip_append(struct superblock *, int, struct pf_skip_step *,
 	    struct pf_opt_rule *);
 int	skip_compare(int, struct pf_skip_step *, struct pf_opt_rule *);
 void	skip_init(void);
 int	skip_cmp_af(struct pf_rule *, struct pf_rule *);
 int	skip_cmp_dir(struct pf_rule *, struct pf_rule *);
 int	skip_cmp_dst_addr(struct pf_rule *, struct pf_rule *);
 int	skip_cmp_dst_port(struct pf_rule *, struct pf_rule *);
 int	skip_cmp_ifp(struct pf_rule *, struct pf_rule *);
 int	skip_cmp_proto(struct pf_rule *, struct pf_rule *);
 int	skip_cmp_src_addr(struct pf_rule *, struct pf_rule *);
 int	skip_cmp_src_port(struct pf_rule *, struct pf_rule *);
 int	superblock_inclusive(struct superblock *, struct pf_opt_rule *);
 void	superblock_free(struct pfctl *, struct superblock *);
 
 
-int (*skip_comparitors[PF_SKIP_COUNT])(struct pf_rule *, struct pf_rule *);
-const char *skip_comparitors_names[PF_SKIP_COUNT];
+static int (*skip_comparitors[PF_SKIP_COUNT])(struct pf_rule *,
+    struct pf_rule *);
+static const char *skip_comparitors_names[PF_SKIP_COUNT];
 #define PF_SKIP_COMPARITORS {				\
     { "ifp", PF_SKIP_IFP, skip_cmp_ifp },		\
     { "dir", PF_SKIP_DIR, skip_cmp_dir },		\
     { "af", PF_SKIP_AF, skip_cmp_af },			\
     { "proto", PF_SKIP_PROTO, skip_cmp_proto },		\
     { "saddr", PF_SKIP_SRC_ADDR, skip_cmp_src_addr },	\
     { "sport", PF_SKIP_SRC_PORT, skip_cmp_src_port },	\
     { "daddr", PF_SKIP_DST_ADDR, skip_cmp_dst_addr },	\
     { "dport", PF_SKIP_DST_PORT, skip_cmp_dst_port }	\
 }
 
-struct pfr_buffer table_buffer;
-int table_identifier;
+static struct pfr_buffer table_buffer;
+static int table_identifier;
 
 
 int
 pfctl_optimize_ruleset(struct pfctl *pf, struct pf_ruleset *rs)
 {
 	struct superblocks superblocks;
 	struct pf_opt_queue opt_queue;
 	struct superblock *block;
 	struct pf_opt_rule *por;
 	struct pf_rule *r;
 	struct pf_rulequeue *old_rules;
 
 	DEBUG("optimizing ruleset");
 	memset(&table_buffer, 0, sizeof(table_buffer));
 	skip_init();
 	TAILQ_INIT(&opt_queue);
 
 	old_rules = rs->rules[PF_RULESET_FILTER].active.ptr;
 	rs->rules[PF_RULESET_FILTER].active.ptr =
 	    rs->rules[PF_RULESET_FILTER].inactive.ptr;
 	rs->rules[PF_RULESET_FILTER].inactive.ptr = old_rules;
 
 	/*
 	 * XXX expanding the pf_opt_rule format throughout pfctl might allow
 	 * us to avoid all this copying.
 	 */
 	while ((r = TAILQ_FIRST(rs->rules[PF_RULESET_FILTER].inactive.ptr))
 	    != NULL) {
 		TAILQ_REMOVE(rs->rules[PF_RULESET_FILTER].inactive.ptr, r,
 		    entries);
 		if ((por = calloc(1, sizeof(*por))) == NULL)
 			err(1, "calloc");
 		memcpy(&por->por_rule, r, sizeof(*r));
 		if (TAILQ_FIRST(&r->rpool.list) != NULL) {
 			TAILQ_INIT(&por->por_rule.rpool.list);
 			pfctl_move_pool(&r->rpool, &por->por_rule.rpool);
 		} else
 			bzero(&por->por_rule.rpool,
 			    sizeof(por->por_rule.rpool));
 
 
 		TAILQ_INSERT_TAIL(&opt_queue, por, por_entry);
 	}
 
 	TAILQ_INIT(&superblocks);
 	if (construct_superblocks(pf, &opt_queue, &superblocks))
 		goto error;
 
 	if (pf->optimize & PF_OPTIMIZE_PROFILE) {
 		if (load_feedback_profile(pf, &superblocks))
 			goto error;
 	}
 
 	TAILQ_FOREACH(block, &superblocks, sb_entry) {
 		if (optimize_superblock(pf, block))
 			goto error;
 	}
 
 	rs->anchor->refcnt = 0;
 	while ((block = TAILQ_FIRST(&superblocks))) {
 		TAILQ_REMOVE(&superblocks, block, sb_entry);
 
 		while ((por = TAILQ_FIRST(&block->sb_rules))) {
 			TAILQ_REMOVE(&block->sb_rules, por, por_entry);
 			por->por_rule.nr = rs->anchor->refcnt++;
 			if ((r = calloc(1, sizeof(*r))) == NULL)
 				err(1, "calloc");
 			memcpy(r, &por->por_rule, sizeof(*r));
 			TAILQ_INIT(&r->rpool.list);
 			pfctl_move_pool(&por->por_rule.rpool, &r->rpool);
 			TAILQ_INSERT_TAIL(
 			    rs->rules[PF_RULESET_FILTER].active.ptr,
 			    r, entries);
 			free(por);
 		}
 		free(block);
 	}
 
 	return (0);
 
 error:
 	while ((por = TAILQ_FIRST(&opt_queue))) {
 		TAILQ_REMOVE(&opt_queue, por, por_entry);
 		if (por->por_src_tbl) {
 			pfr_buf_clear(por->por_src_tbl->pt_buf);
 			free(por->por_src_tbl->pt_buf);
 			free(por->por_src_tbl);
 		}
 		if (por->por_dst_tbl) {
 			pfr_buf_clear(por->por_dst_tbl->pt_buf);
 			free(por->por_dst_tbl->pt_buf);
 			free(por->por_dst_tbl);
 		}
 		free(por);
 	}
 	while ((block = TAILQ_FIRST(&superblocks))) {
 		TAILQ_REMOVE(&superblocks, block, sb_entry);
 		superblock_free(pf, block);
 	}
 	return (1);
 }
 
 
 /*
  * Go ahead and optimize a superblock
  */
 int
 optimize_superblock(struct pfctl *pf, struct superblock *block)
 {
 #ifdef OPT_DEBUG
 	struct pf_opt_rule *por;
 #endif /* OPT_DEBUG */
 
 	/* We have a few optimization passes:
 	 *   1) remove duplicate rules or rules that are a subset of other
 	 *      rules
 	 *   2) combine otherwise identical rules with different IP addresses
 	 *      into a single rule and put the addresses in a table.
 	 *   3) re-order the rules to improve kernel skip steps
 	 *   4) re-order the 'quick' rules based on feedback from the
 	 *      active ruleset statistics
 	 *
 	 * XXX combine_rules() doesn't combine v4 and v6 rules.  would just
 	 *     have to keep af in the table container, make af 'COMBINE' and
 	 *     twiddle the af on the merged rule
 	 * XXX maybe add a weighting to the metric on skipsteps when doing
 	 *     reordering.  sometimes two sequential tables will be better
 	 *     that four consecutive interfaces.
 	 * XXX need to adjust the skipstep count of everything after PROTO,
 	 *     since they aren't actually checked on a proto mismatch in
 	 *     pf_test_{tcp, udp, icmp}()
 	 * XXX should i treat proto=0, af=0 or dir=0 special in skepstep
 	 *     calculation since they are a DC?
 	 * XXX keep last skiplist of last superblock to influence this
 	 *     superblock.  '5 inet6 log' should make '3 inet6' come before '4
 	 *     inet' in the next superblock.
 	 * XXX would be useful to add tables for ports
 	 * XXX we can also re-order some mutually exclusive superblocks to
 	 *     try merging superblocks before any of these optimization passes.
 	 *     for instance a single 'log in' rule in the middle of non-logging
 	 *     out rules.
 	 */
 
 	/* shortcut.  there will be a lot of 1-rule superblocks */
 	if (!TAILQ_NEXT(TAILQ_FIRST(&block->sb_rules), por_entry))
 		return (0);
 
 #ifdef OPT_DEBUG
 	printf("--- Superblock ---\n");
 	TAILQ_FOREACH(por, &block->sb_rules, por_entry) {
 		printf("  ");
 		print_rule(&por->por_rule, por->por_rule.anchor ?
 		    por->por_rule.anchor->name : "", 1, 0);
 	}
 #endif /* OPT_DEBUG */
 
 
 	if (remove_identical_rules(pf, block))
 		return (1);
 	if (combine_rules(pf, block))
 		return (1);
 	if ((pf->optimize & PF_OPTIMIZE_PROFILE) &&
 	    TAILQ_FIRST(&block->sb_rules)->por_rule.quick &&
 	    block->sb_profiled_block) {
 		if (block_feedback(pf, block))
 			return (1);
 	} else if (reorder_rules(pf, block, 0)) {
 		return (1);
 	}
 
 	/*
 	 * Don't add any optimization passes below reorder_rules().  It will
 	 * have divided superblocks into smaller blocks for further refinement
 	 * and doesn't put them back together again.  What once was a true
 	 * superblock might have been split into multiple superblocks.
 	 */
 
 #ifdef OPT_DEBUG
 	printf("--- END Superblock ---\n");
 #endif /* OPT_DEBUG */
 	return (0);
 }
 
 
 /*
  * Optimization pass #1: remove identical rules
  */
 int
 remove_identical_rules(struct pfctl *pf, struct superblock *block)
 {
 	struct pf_opt_rule *por1, *por2, *por_next, *por2_next;
 	struct pf_rule a, a2, b, b2;
 
 	for (por1 = TAILQ_FIRST(&block->sb_rules); por1; por1 = por_next) {
 		por_next = TAILQ_NEXT(por1, por_entry);
 		for (por2 = por_next; por2; por2 = por2_next) {
 			por2_next = TAILQ_NEXT(por2, por_entry);
 			comparable_rule(&a, &por1->por_rule, DC);
 			comparable_rule(&b, &por2->por_rule, DC);
 			memcpy(&a2, &a, sizeof(a2));
 			memcpy(&b2, &b, sizeof(b2));
 
 			exclude_supersets(&a, &b);
 			exclude_supersets(&b2, &a2);
 			if (memcmp(&a, &b, sizeof(a)) == 0) {
 				DEBUG("removing identical rule  nr%d = *nr%d*",
 				    por1->por_rule.nr, por2->por_rule.nr);
 				TAILQ_REMOVE(&block->sb_rules, por2, por_entry);
 				if (por_next == por2)
 					por_next = TAILQ_NEXT(por1, por_entry);
 				free(por2);
 			} else if (memcmp(&a2, &b2, sizeof(a2)) == 0) {
 				DEBUG("removing identical rule  *nr%d* = nr%d",
 				    por1->por_rule.nr, por2->por_rule.nr);
 				TAILQ_REMOVE(&block->sb_rules, por1, por_entry);
 				free(por1);
 				break;
 			}
 		}
 	}
 
 	return (0);
 }
 
 
 /*
  * Optimization pass #2: combine similar rules with different addresses
  * into a single rule and a table
  */
 int
 combine_rules(struct pfctl *pf, struct superblock *block)
 {
 	struct pf_opt_rule *p1, *p2, *por_next;
 	int src_eq, dst_eq;
 
 	if ((pf->loadopt & PFCTL_FLAG_TABLE) == 0) {
 		warnx("Must enable table loading for optimizations");
 		return (1);
 	}
 
 	/* First we make a pass to combine the rules.  O(n log n) */
 	TAILQ_FOREACH(p1, &block->sb_rules, por_entry) {
 		for (p2 = TAILQ_NEXT(p1, por_entry); p2; p2 = por_next) {
 			por_next = TAILQ_NEXT(p2, por_entry);
 
 			src_eq = addrs_equal(&p1->por_rule.src,
 			    &p2->por_rule.src);
 			dst_eq = addrs_equal(&p1->por_rule.dst,
 			    &p2->por_rule.dst);
 
 			if (src_eq && !dst_eq && p1->por_src_tbl == NULL &&
 			    p2->por_dst_tbl == NULL &&
 			    p2->por_src_tbl == NULL &&
 			    rules_combineable(&p1->por_rule, &p2->por_rule) &&
 			    addrs_combineable(&p1->por_rule.dst,
 			    &p2->por_rule.dst)) {
 				DEBUG("can combine rules  nr%d = nr%d",
 				    p1->por_rule.nr, p2->por_rule.nr);
 				if (p1->por_dst_tbl == NULL &&
 				    add_opt_table(pf, &p1->por_dst_tbl,
 				    p1->por_rule.af, &p1->por_rule.dst))
 					return (1);
 				if (add_opt_table(pf, &p1->por_dst_tbl,
 				    p1->por_rule.af, &p2->por_rule.dst))
 					return (1);
 				p2->por_dst_tbl = p1->por_dst_tbl;
 				if (p1->por_dst_tbl->pt_rulecount >=
 				    TABLE_THRESHOLD) {
 					TAILQ_REMOVE(&block->sb_rules, p2,
 					    por_entry);
 					free(p2);
 				}
 			} else if (!src_eq && dst_eq && p1->por_dst_tbl == NULL
 			    && p2->por_src_tbl == NULL &&
 			    p2->por_dst_tbl == NULL &&
 			    rules_combineable(&p1->por_rule, &p2->por_rule) &&
 			    addrs_combineable(&p1->por_rule.src,
 			    &p2->por_rule.src)) {
 				DEBUG("can combine rules  nr%d = nr%d",
 				    p1->por_rule.nr, p2->por_rule.nr);
 				if (p1->por_src_tbl == NULL &&
 				    add_opt_table(pf, &p1->por_src_tbl,
 				    p1->por_rule.af, &p1->por_rule.src))
 					return (1);
 				if (add_opt_table(pf, &p1->por_src_tbl,
 				    p1->por_rule.af, &p2->por_rule.src))
 					return (1);
 				p2->por_src_tbl = p1->por_src_tbl;
 				if (p1->por_src_tbl->pt_rulecount >=
 				    TABLE_THRESHOLD) {
 					TAILQ_REMOVE(&block->sb_rules, p2,
 					    por_entry);
 					free(p2);
 				}
 			}
 		}
 	}
 
 
 	/*
 	 * Then we make a final pass to create a valid table name and
 	 * insert the name into the rules.
 	 */
 	for (p1 = TAILQ_FIRST(&block->sb_rules); p1; p1 = por_next) {
 		por_next = TAILQ_NEXT(p1, por_entry);
 		assert(p1->por_src_tbl == NULL || p1->por_dst_tbl == NULL);
 
 		if (p1->por_src_tbl && p1->por_src_tbl->pt_rulecount >=
 		    TABLE_THRESHOLD) {
 			if (p1->por_src_tbl->pt_generated) {
 				/* This rule is included in a table */
 				TAILQ_REMOVE(&block->sb_rules, p1, por_entry);
 				free(p1);
 				continue;
 			}
 			p1->por_src_tbl->pt_generated = 1;
 
 			if ((pf->opts & PF_OPT_NOACTION) == 0 &&
 			    pf_opt_create_table(pf, p1->por_src_tbl))
 				return (1);
 
 			pf->tdirty = 1;
 
 			if (pf->opts & PF_OPT_VERBOSE)
 				print_tabledef(p1->por_src_tbl->pt_name,
 				    PFR_TFLAG_CONST, 1,
 				    &p1->por_src_tbl->pt_nodes);
 
 			memset(&p1->por_rule.src.addr, 0,
 			    sizeof(p1->por_rule.src.addr));
 			p1->por_rule.src.addr.type = PF_ADDR_TABLE;
 			strlcpy(p1->por_rule.src.addr.v.tblname,
 			    p1->por_src_tbl->pt_name,
 			    sizeof(p1->por_rule.src.addr.v.tblname));
 
 			pfr_buf_clear(p1->por_src_tbl->pt_buf);
 			free(p1->por_src_tbl->pt_buf);
 			p1->por_src_tbl->pt_buf = NULL;
 		}
 		if (p1->por_dst_tbl && p1->por_dst_tbl->pt_rulecount >=
 		    TABLE_THRESHOLD) {
 			if (p1->por_dst_tbl->pt_generated) {
 				/* This rule is included in a table */
 				TAILQ_REMOVE(&block->sb_rules, p1, por_entry);
 				free(p1);
 				continue;
 			}
 			p1->por_dst_tbl->pt_generated = 1;
 
 			if ((pf->opts & PF_OPT_NOACTION) == 0 &&
 			    pf_opt_create_table(pf, p1->por_dst_tbl))
 				return (1);
 			pf->tdirty = 1;
 
 			if (pf->opts & PF_OPT_VERBOSE)
 				print_tabledef(p1->por_dst_tbl->pt_name,
 				    PFR_TFLAG_CONST, 1,
 				    &p1->por_dst_tbl->pt_nodes);
 
 			memset(&p1->por_rule.dst.addr, 0,
 			    sizeof(p1->por_rule.dst.addr));
 			p1->por_rule.dst.addr.type = PF_ADDR_TABLE;
 			strlcpy(p1->por_rule.dst.addr.v.tblname,
 			    p1->por_dst_tbl->pt_name,
 			    sizeof(p1->por_rule.dst.addr.v.tblname));
 
 			pfr_buf_clear(p1->por_dst_tbl->pt_buf);
 			free(p1->por_dst_tbl->pt_buf);
 			p1->por_dst_tbl->pt_buf = NULL;
 		}
 	}
 
 	return (0);
 }
 
 
 /*
  * Optimization pass #3: re-order rules to improve skip steps
  */
 int
 reorder_rules(struct pfctl *pf, struct superblock *block, int depth)
 {
 	struct superblock *newblock;
 	struct pf_skip_step *skiplist;
 	struct pf_opt_rule *por;
 	int i, largest, largest_list, rule_count = 0;
 	TAILQ_HEAD( , pf_opt_rule) head;
 
 	/*
 	 * Calculate the best-case skip steps.  We put each rule in a list
 	 * of other rules with common fields
 	 */
 	for (i = 0; i < PF_SKIP_COUNT; i++) {
 		TAILQ_FOREACH(por, &block->sb_rules, por_entry) {
 			TAILQ_FOREACH(skiplist, &block->sb_skipsteps[i],
 			    ps_entry) {
 				if (skip_compare(i, skiplist, por) == 0)
 					break;
 			}
 			if (skiplist == NULL) {
 				if ((skiplist = calloc(1, sizeof(*skiplist))) ==
 				    NULL)
 					err(1, "calloc");
 				TAILQ_INIT(&skiplist->ps_rules);
 				TAILQ_INSERT_TAIL(&block->sb_skipsteps[i],
 				    skiplist, ps_entry);
 			}
 			skip_append(block, i, skiplist, por);
 		}
 	}
 
 	TAILQ_FOREACH(por, &block->sb_rules, por_entry)
 		rule_count++;
 
 	/*
 	 * Now we're going to ignore any fields that are identical between
 	 * all of the rules in the superblock and those fields which differ
 	 * between every rule in the superblock.
 	 */
 	largest = 0;
 	for (i = 0; i < PF_SKIP_COUNT; i++) {
 		skiplist = TAILQ_FIRST(&block->sb_skipsteps[i]);
 		if (skiplist->ps_count == rule_count) {
 			DEBUG("(%d) original skipstep '%s' is all rules",
 			    depth, skip_comparitors_names[i]);
 			skiplist->ps_count = 0;
 		} else if (skiplist->ps_count == 1) {
 			skiplist->ps_count = 0;
 		} else {
 			DEBUG("(%d) original skipstep '%s' largest jump is %d",
 			    depth, skip_comparitors_names[i],
 			    skiplist->ps_count);
 			if (skiplist->ps_count > largest)
 				largest = skiplist->ps_count;
 		}
 	}
 	if (largest == 0) {
 		/* Ugh.  There is NO commonality in the superblock on which
 		 * optimize the skipsteps optimization.
 		 */
 		goto done;
 	}
 
 	/*
 	 * Now we're going to empty the superblock rule list and re-create
 	 * it based on a more optimal skipstep order.
 	 */
 	TAILQ_INIT(&head);
 	while ((por = TAILQ_FIRST(&block->sb_rules))) {
 		TAILQ_REMOVE(&block->sb_rules, por, por_entry);
 		TAILQ_INSERT_TAIL(&head, por, por_entry);
 	}
 
 
 	while (!TAILQ_EMPTY(&head)) {
 		largest = 1;
 
 		/*
 		 * Find the most useful skip steps remaining
 		 */
 		for (i = 0; i < PF_SKIP_COUNT; i++) {
 			skiplist = TAILQ_FIRST(&block->sb_skipsteps[i]);
 			if (skiplist->ps_count > largest) {
 				largest = skiplist->ps_count;
 				largest_list = i;
 			}
 		}
 
 		if (largest <= 1) {
 			/*
 			 * Nothing useful left.  Leave remaining rules in order.
 			 */
 			DEBUG("(%d) no more commonality for skip steps", depth);
 			while ((por = TAILQ_FIRST(&head))) {
 				TAILQ_REMOVE(&head, por, por_entry);
 				TAILQ_INSERT_TAIL(&block->sb_rules, por,
 				    por_entry);
 			}
 		} else {
 			/*
 			 * There is commonality.  Extract those common rules
 			 * and place them in the ruleset adjacent to each
 			 * other.
 			 */
 			skiplist = TAILQ_FIRST(&block->sb_skipsteps[
 			    largest_list]);
 			DEBUG("(%d) skipstep '%s' largest jump is %d @ #%d",
 			    depth, skip_comparitors_names[largest_list],
 			    largest, TAILQ_FIRST(&TAILQ_FIRST(&block->
 			    sb_skipsteps [largest_list])->ps_rules)->
 			    por_rule.nr);
 			TAILQ_REMOVE(&block->sb_skipsteps[largest_list],
 			    skiplist, ps_entry);
 
 
 			/*
 			 * There may be further commonality inside these
 			 * rules.  So we'll split them off into they're own
 			 * superblock and pass it back into the optimizer.
 			 */
 			if (skiplist->ps_count > 2) {
 				if ((newblock = calloc(1, sizeof(*newblock)))
 				    == NULL) {
 					warn("calloc");
 					return (1);
 				}
 				TAILQ_INIT(&newblock->sb_rules);
 				for (i = 0; i < PF_SKIP_COUNT; i++)
 					TAILQ_INIT(&newblock->sb_skipsteps[i]);
 				TAILQ_INSERT_BEFORE(block, newblock, sb_entry);
 				DEBUG("(%d) splitting off %d rules from superblock @ #%d",
 				    depth, skiplist->ps_count,
 				    TAILQ_FIRST(&skiplist->ps_rules)->
 				    por_rule.nr);
 			} else {
 				newblock = block;
 			}
 
 			while ((por = TAILQ_FIRST(&skiplist->ps_rules))) {
 				TAILQ_REMOVE(&head, por, por_entry);
 				TAILQ_REMOVE(&skiplist->ps_rules, por,
 				    por_skip_entry[largest_list]);
 				TAILQ_INSERT_TAIL(&newblock->sb_rules, por,
 				    por_entry);
 
 				/* Remove this rule from all other skiplists */
 				remove_from_skipsteps(&block->sb_skipsteps[
 				    largest_list], block, por, skiplist);
 			}
 			free(skiplist);
 			if (newblock != block)
 				if (reorder_rules(pf, newblock, depth + 1))
 					return (1);
 		}
 	}
 
 done:
 	for (i = 0; i < PF_SKIP_COUNT; i++) {
 		while ((skiplist = TAILQ_FIRST(&block->sb_skipsteps[i]))) {
 			TAILQ_REMOVE(&block->sb_skipsteps[i], skiplist,
 			    ps_entry);
 			free(skiplist);
 		}
 	}
 
 	return (0);
 }
 
 
 /*
  * Optimization pass #4: re-order 'quick' rules based on feedback from the
  * currently running ruleset
  */
 int
 block_feedback(struct pfctl *pf, struct superblock *block)
 {
 	TAILQ_HEAD( , pf_opt_rule) queue;
 	struct pf_opt_rule *por1, *por2;
 	u_int64_t total_count = 0;
 	struct pf_rule a, b;
 
 
 	/*
 	 * Walk through all of the profiled superblock's rules and copy
 	 * the counters onto our rules.
 	 */
 	TAILQ_FOREACH(por1, &block->sb_profiled_block->sb_rules, por_entry) {
 		comparable_rule(&a, &por1->por_rule, DC);
 		total_count += por1->por_rule.packets[0] +
 		    por1->por_rule.packets[1];
 		TAILQ_FOREACH(por2, &block->sb_rules, por_entry) {
 			if (por2->por_profile_count)
 				continue;
 			comparable_rule(&b, &por2->por_rule, DC);
 			if (memcmp(&a, &b, sizeof(a)) == 0) {
 				por2->por_profile_count =
 				    por1->por_rule.packets[0] +
 				    por1->por_rule.packets[1];
 				break;
 			}
 		}
 	}
 	superblock_free(pf, block->sb_profiled_block);
 	block->sb_profiled_block = NULL;
 
 	/*
 	 * Now we pull all of the rules off the superblock and re-insert them
 	 * in sorted order.
 	 */
 
 	TAILQ_INIT(&queue);
 	while ((por1 = TAILQ_FIRST(&block->sb_rules)) != NULL) {
 		TAILQ_REMOVE(&block->sb_rules, por1, por_entry);
 		TAILQ_INSERT_TAIL(&queue, por1, por_entry);
 	}
 
 	while ((por1 = TAILQ_FIRST(&queue)) != NULL) {
 		TAILQ_REMOVE(&queue, por1, por_entry);
 /* XXX I should sort all of the unused rules based on skip steps */
 		TAILQ_FOREACH(por2, &block->sb_rules, por_entry) {
 			if (por1->por_profile_count > por2->por_profile_count) {
 				TAILQ_INSERT_BEFORE(por2, por1, por_entry);
 				break;
 			}
 		}
 #ifdef __FreeBSD__
 		if (por2 == NULL)
 #else
 		if (por2 == TAILQ_END(&block->sb_rules))
 #endif
 			TAILQ_INSERT_TAIL(&block->sb_rules, por1, por_entry);
 	}
 
 	return (0);
 }
 
 
 /*
  * Load the current ruleset from the kernel and try to associate them with
  * the ruleset we're optimizing.
  */
 int
 load_feedback_profile(struct pfctl *pf, struct superblocks *superblocks)
 {
 	struct superblock *block, *blockcur;
 	struct superblocks prof_superblocks;
 	struct pf_opt_rule *por;
 	struct pf_opt_queue queue;
 	struct pfioc_rule pr;
 	struct pf_rule a, b;
 	int nr, mnr;
 
 	TAILQ_INIT(&queue);
 	TAILQ_INIT(&prof_superblocks);
 
 	memset(&pr, 0, sizeof(pr));
 	pr.rule.action = PF_PASS;
 	if (ioctl(pf->dev, DIOCGETRULES, &pr)) {
 		warn("DIOCGETRULES");
 		return (1);
 	}
 	mnr = pr.nr;
 
 	DEBUG("Loading %d active rules for a feedback profile", mnr);
 	for (nr = 0; nr < mnr; ++nr) {
 		struct pf_ruleset *rs;
 		if ((por = calloc(1, sizeof(*por))) == NULL) {
 			warn("calloc");
 			return (1);
 		}
 		pr.nr = nr;
 		if (ioctl(pf->dev, DIOCGETRULE, &pr)) {
 			warn("DIOCGETRULES");
 			return (1);
 		}
 		memcpy(&por->por_rule, &pr.rule, sizeof(por->por_rule));
 		rs = pf_find_or_create_ruleset(pr.anchor_call);
 		por->por_rule.anchor = rs->anchor;
 		if (TAILQ_EMPTY(&por->por_rule.rpool.list))
 			memset(&por->por_rule.rpool, 0,
 			    sizeof(por->por_rule.rpool));
 		TAILQ_INSERT_TAIL(&queue, por, por_entry);
 
 		/* XXX pfctl_get_pool(pf->dev, &pr.rule.rpool, nr, pr.ticket,
 		 *         PF_PASS, pf->anchor) ???
 		 * ... pfctl_clear_pool(&pr.rule.rpool)
 		 */
 	}
 
 	if (construct_superblocks(pf, &queue, &prof_superblocks))
 		return (1);
 
 
 	/*
 	 * Now we try to associate the active ruleset's superblocks with
 	 * the superblocks we're compiling.
 	 */
 	block = TAILQ_FIRST(superblocks);
 	blockcur = TAILQ_FIRST(&prof_superblocks);
 	while (block && blockcur) {
 		comparable_rule(&a, &TAILQ_FIRST(&block->sb_rules)->por_rule,
 		    BREAK);
 		comparable_rule(&b, &TAILQ_FIRST(&blockcur->sb_rules)->por_rule,
 		    BREAK);
 		if (memcmp(&a, &b, sizeof(a)) == 0) {
 			/* The two superblocks lined up */
 			block->sb_profiled_block = blockcur;
 		} else {
 			DEBUG("superblocks don't line up between #%d and #%d",
 			    TAILQ_FIRST(&block->sb_rules)->por_rule.nr,
 			    TAILQ_FIRST(&blockcur->sb_rules)->por_rule.nr);
 			break;
 		}
 		block = TAILQ_NEXT(block, sb_entry);
 		blockcur = TAILQ_NEXT(blockcur, sb_entry);
 	}
 
 
 
 	/* Free any superblocks we couldn't link */
 	while (blockcur) {
 		block = TAILQ_NEXT(blockcur, sb_entry);
 		superblock_free(pf, blockcur);
 		blockcur = block;
 	}
 	return (0);
 }
 
 
 /*
  * Compare a rule to a skiplist to see if the rule is a member
  */
 int
 skip_compare(int skipnum, struct pf_skip_step *skiplist,
     struct pf_opt_rule *por)
 {
 	struct pf_rule *a, *b;
 	if (skipnum >= PF_SKIP_COUNT || skipnum < 0)
 		errx(1, "skip_compare() out of bounds");
 	a = &por->por_rule;
 	b = &TAILQ_FIRST(&skiplist->ps_rules)->por_rule;
 
 	return ((skip_comparitors[skipnum])(a, b));
 }
 
 
 /*
  * Add a rule to a skiplist
  */
 void
 skip_append(struct superblock *superblock, int skipnum,
     struct pf_skip_step *skiplist, struct pf_opt_rule *por)
 {
 	struct pf_skip_step *prev;
 
 	skiplist->ps_count++;
 	TAILQ_INSERT_TAIL(&skiplist->ps_rules, por, por_skip_entry[skipnum]);
 
 	/* Keep the list of skiplists sorted by whichever is larger */
 	while ((prev = TAILQ_PREV(skiplist, skiplist, ps_entry)) &&
 	    prev->ps_count < skiplist->ps_count) {
 		TAILQ_REMOVE(&superblock->sb_skipsteps[skipnum],
 		    skiplist, ps_entry);
 		TAILQ_INSERT_BEFORE(prev, skiplist, ps_entry);
 	}
 }
 
 
 /*
  * Remove a rule from the other skiplist calculations.
  */
 void
 remove_from_skipsteps(struct skiplist *head, struct superblock *block,
     struct pf_opt_rule *por, struct pf_skip_step *active_list)
 {
 	struct pf_skip_step *sk, *next;
 	struct pf_opt_rule *p2;
 	int i, found;
 
 	for (i = 0; i < PF_SKIP_COUNT; i++) {
 		sk = TAILQ_FIRST(&block->sb_skipsteps[i]);
 		if (sk == NULL || sk == active_list || sk->ps_count <= 1)
 			continue;
 		found = 0;
 		do {
 			TAILQ_FOREACH(p2, &sk->ps_rules, por_skip_entry[i])
 				if (p2 == por) {
 					TAILQ_REMOVE(&sk->ps_rules, p2,
 					    por_skip_entry[i]);
 					found = 1;
 					sk->ps_count--;
 					break;
 				}
 		} while (!found && (sk = TAILQ_NEXT(sk, ps_entry)));
 		if (found && sk) {
 			/* Does this change the sorting order? */
 			while ((next = TAILQ_NEXT(sk, ps_entry)) &&
 			    next->ps_count > sk->ps_count) {
 				TAILQ_REMOVE(head, sk, ps_entry);
 				TAILQ_INSERT_AFTER(head, next, sk, ps_entry);
 			}
 #ifdef OPT_DEBUG
 			next = TAILQ_NEXT(sk, ps_entry);
 			assert(next == NULL || next->ps_count <= sk->ps_count);
 #endif /* OPT_DEBUG */
 		}
 	}
 }
 
 
 /* Compare two rules AF field for skiplist construction */
 int
 skip_cmp_af(struct pf_rule *a, struct pf_rule *b)
 {
 	if (a->af != b->af || a->af == 0)
 		return (1);
 	return (0);
 }
 
 /* Compare two rules DIRECTION field for skiplist construction */
 int
 skip_cmp_dir(struct pf_rule *a, struct pf_rule *b)
 {
 	if (a->direction == 0 || a->direction != b->direction)
 		return (1);
 	return (0);
 }
 
 /* Compare two rules DST Address field for skiplist construction */
 int
 skip_cmp_dst_addr(struct pf_rule *a, struct pf_rule *b)
 {
 	if (a->dst.neg != b->dst.neg ||
 	    a->dst.addr.type != b->dst.addr.type)
 		return (1);
 	/* XXX if (a->proto != b->proto && a->proto != 0 && b->proto != 0
 	 *    && (a->proto == IPPROTO_TCP || a->proto == IPPROTO_UDP ||
 	 *    a->proto == IPPROTO_ICMP
 	 *	return (1);
 	 */
 	switch (a->dst.addr.type) {
 	case PF_ADDR_ADDRMASK:
 		if (memcmp(&a->dst.addr.v.a.addr, &b->dst.addr.v.a.addr,
 		    sizeof(a->dst.addr.v.a.addr)) ||
 		    memcmp(&a->dst.addr.v.a.mask, &b->dst.addr.v.a.mask,
 		    sizeof(a->dst.addr.v.a.mask)) ||
 		    (a->dst.addr.v.a.addr.addr32[0] == 0 &&
 		    a->dst.addr.v.a.addr.addr32[1] == 0 &&
 		    a->dst.addr.v.a.addr.addr32[2] == 0 &&
 		    a->dst.addr.v.a.addr.addr32[3] == 0))
 			return (1);
 		return (0);
 	case PF_ADDR_DYNIFTL:
 		if (strcmp(a->dst.addr.v.ifname, b->dst.addr.v.ifname) != 0 ||
 		    a->dst.addr.iflags != a->dst.addr.iflags ||
 		    memcmp(&a->dst.addr.v.a.mask, &b->dst.addr.v.a.mask,
 		    sizeof(a->dst.addr.v.a.mask)))
 			return (1);
 		return (0);
 	case PF_ADDR_NOROUTE:
 	case PF_ADDR_URPFFAILED:
 		return (0);
 	case PF_ADDR_TABLE:
 		return (strcmp(a->dst.addr.v.tblname, b->dst.addr.v.tblname));
 	}
 	return (1);
 }
 
 /* Compare two rules DST port field for skiplist construction */
 int
 skip_cmp_dst_port(struct pf_rule *a, struct pf_rule *b)
 {
 	/* XXX if (a->proto != b->proto && a->proto != 0 && b->proto != 0
 	 *    && (a->proto == IPPROTO_TCP || a->proto == IPPROTO_UDP ||
 	 *    a->proto == IPPROTO_ICMP
 	 *	return (1);
 	 */
 	if (a->dst.port_op == PF_OP_NONE || a->dst.port_op != b->dst.port_op ||
 	    a->dst.port[0] != b->dst.port[0] ||
 	    a->dst.port[1] != b->dst.port[1])
 		return (1);
 	return (0);
 }
 
 /* Compare two rules IFP field for skiplist construction */
 int
 skip_cmp_ifp(struct pf_rule *a, struct pf_rule *b)
 {
 	if (strcmp(a->ifname, b->ifname) || a->ifname[0] == '\0')
 		return (1);
 	return (a->ifnot != b->ifnot);
 }
 
 /* Compare two rules PROTO field for skiplist construction */
 int
 skip_cmp_proto(struct pf_rule *a, struct pf_rule *b)
 {
 	return (a->proto != b->proto || a->proto == 0);
 }
 
 /* Compare two rules SRC addr field for skiplist construction */
 int
 skip_cmp_src_addr(struct pf_rule *a, struct pf_rule *b)
 {
 	if (a->src.neg != b->src.neg ||
 	    a->src.addr.type != b->src.addr.type)
 		return (1);
 	/* XXX if (a->proto != b->proto && a->proto != 0 && b->proto != 0
 	 *    && (a->proto == IPPROTO_TCP || a->proto == IPPROTO_UDP ||
 	 *    a->proto == IPPROTO_ICMP
 	 *	return (1);
 	 */
 	switch (a->src.addr.type) {
 	case PF_ADDR_ADDRMASK:
 		if (memcmp(&a->src.addr.v.a.addr, &b->src.addr.v.a.addr,
 		    sizeof(a->src.addr.v.a.addr)) ||
 		    memcmp(&a->src.addr.v.a.mask, &b->src.addr.v.a.mask,
 		    sizeof(a->src.addr.v.a.mask)) ||
 		    (a->src.addr.v.a.addr.addr32[0] == 0 &&
 		    a->src.addr.v.a.addr.addr32[1] == 0 &&
 		    a->src.addr.v.a.addr.addr32[2] == 0 &&
 		    a->src.addr.v.a.addr.addr32[3] == 0))
 			return (1);
 		return (0);
 	case PF_ADDR_DYNIFTL:
 		if (strcmp(a->src.addr.v.ifname, b->src.addr.v.ifname) != 0 ||
 		    a->src.addr.iflags != a->src.addr.iflags ||
 		    memcmp(&a->src.addr.v.a.mask, &b->src.addr.v.a.mask,
 		    sizeof(a->src.addr.v.a.mask)))
 			return (1);
 		return (0);
 	case PF_ADDR_NOROUTE:
 	case PF_ADDR_URPFFAILED:
 		return (0);
 	case PF_ADDR_TABLE:
 		return (strcmp(a->src.addr.v.tblname, b->src.addr.v.tblname));
 	}
 	return (1);
 }
 
 /* Compare two rules SRC port field for skiplist construction */
 int
 skip_cmp_src_port(struct pf_rule *a, struct pf_rule *b)
 {
 	if (a->src.port_op == PF_OP_NONE || a->src.port_op != b->src.port_op ||
 	    a->src.port[0] != b->src.port[0] ||
 	    a->src.port[1] != b->src.port[1])
 		return (1);
 	/* XXX if (a->proto != b->proto && a->proto != 0 && b->proto != 0
 	 *    && (a->proto == IPPROTO_TCP || a->proto == IPPROTO_UDP ||
 	 *    a->proto == IPPROTO_ICMP
 	 *	return (1);
 	 */
 	return (0);
 }
 
 
 void
 skip_init(void)
 {
 	struct {
 		char *name;
 		int skipnum;
 		int (*func)(struct pf_rule *, struct pf_rule *);
 	} comps[] = PF_SKIP_COMPARITORS;
 	int skipnum, i;
 
 	for (skipnum = 0; skipnum < PF_SKIP_COUNT; skipnum++) {
 		for (i = 0; i < sizeof(comps)/sizeof(*comps); i++)
 			if (comps[i].skipnum == skipnum) {
 				skip_comparitors[skipnum] = comps[i].func;
 				skip_comparitors_names[skipnum] = comps[i].name;
 			}
 	}
 	for (skipnum = 0; skipnum < PF_SKIP_COUNT; skipnum++)
 		if (skip_comparitors[skipnum] == NULL)
 			errx(1, "Need to add skip step comparitor to pfctl?!");
 }
 
 /*
  * Add a host/netmask to a table
  */
 int
 add_opt_table(struct pfctl *pf, struct pf_opt_tbl **tbl, sa_family_t af,
     struct pf_rule_addr *addr)
 {
 #ifdef OPT_DEBUG
 	char buf[128];
 #endif /* OPT_DEBUG */
 	static int tablenum = 0;
 	struct node_host node_host;
 
 	if (*tbl == NULL) {
 		if ((*tbl = calloc(1, sizeof(**tbl))) == NULL ||
 		    ((*tbl)->pt_buf = calloc(1, sizeof(*(*tbl)->pt_buf))) ==
 		    NULL)
 			err(1, "calloc");
 		(*tbl)->pt_buf->pfrb_type = PFRB_ADDRS;
 		SIMPLEQ_INIT(&(*tbl)->pt_nodes);
 
 		/* This is just a temporary table name */
 		snprintf((*tbl)->pt_name, sizeof((*tbl)->pt_name), "%s%d",
 		    PF_OPT_TABLE_PREFIX, tablenum++);
 		DEBUG("creating table <%s>", (*tbl)->pt_name);
 	}
 
 	memset(&node_host, 0, sizeof(node_host));
 	node_host.af = af;
 	node_host.addr = addr->addr;
 
 #ifdef OPT_DEBUG
 	DEBUG("<%s> adding %s/%d", (*tbl)->pt_name, inet_ntop(af,
 	    &node_host.addr.v.a.addr, buf, sizeof(buf)),
 	    unmask(&node_host.addr.v.a.mask, af));
 #endif /* OPT_DEBUG */
 
 	if (append_addr_host((*tbl)->pt_buf, &node_host, 0, 0)) {
 		warn("failed to add host");
 		return (1);
 	}
 	if (pf->opts & PF_OPT_VERBOSE) {
 		struct node_tinit *ti;
 
 		if ((ti = calloc(1, sizeof(*ti))) == NULL)
 			err(1, "malloc");
 		if ((ti->host = malloc(sizeof(*ti->host))) == NULL)
 			err(1, "malloc");
 		memcpy(ti->host, &node_host, sizeof(*ti->host));
 		SIMPLEQ_INSERT_TAIL(&(*tbl)->pt_nodes, ti, entries);
 	}
 
 	(*tbl)->pt_rulecount++;
 	if ((*tbl)->pt_rulecount == TABLE_THRESHOLD)
 		DEBUG("table <%s> now faster than skip steps", (*tbl)->pt_name);
 
 	return (0);
 }
 
 
 /*
  * Do the dirty work of choosing an unused table name and creating it.
  * (be careful with the table name, it might already be used in another anchor)
  */
 int
 pf_opt_create_table(struct pfctl *pf, struct pf_opt_tbl *tbl)
 {
 	static int tablenum;
 	struct pfr_table *t;
 
 	if (table_buffer.pfrb_type == 0) {
 		/* Initialize the list of tables */
 		table_buffer.pfrb_type = PFRB_TABLES;
 		for (;;) {
 			pfr_buf_grow(&table_buffer, table_buffer.pfrb_size);
 			table_buffer.pfrb_size = table_buffer.pfrb_msize;
 			if (pfr_get_tables(NULL, table_buffer.pfrb_caddr,
 			    &table_buffer.pfrb_size, PFR_FLAG_ALLRSETS))
 				err(1, "pfr_get_tables");
 			if (table_buffer.pfrb_size <= table_buffer.pfrb_msize)
 				break;
 		}
 		table_identifier = arc4random();
 	}
 
 	/* XXX would be *really* nice to avoid duplicating identical tables */
 
 	/* Now we have to pick a table name that isn't used */
 again:
 	DEBUG("translating temporary table <%s> to <%s%x_%d>", tbl->pt_name,
 	    PF_OPT_TABLE_PREFIX, table_identifier, tablenum);
 	snprintf(tbl->pt_name, sizeof(tbl->pt_name), "%s%x_%d",
 	    PF_OPT_TABLE_PREFIX, table_identifier, tablenum);
 	PFRB_FOREACH(t, &table_buffer) {
 		if (strcasecmp(t->pfrt_name, tbl->pt_name) == 0) {
 			/* Collision.  Try again */
 			DEBUG("wow, table <%s> in use.  trying again",
 			    tbl->pt_name);
 			table_identifier = arc4random();
 			goto again;
 		}
 	}
 	tablenum++;
 
 
 	if (pfctl_define_table(tbl->pt_name, PFR_TFLAG_CONST, 1,
 	    pf->astack[0]->name, tbl->pt_buf, pf->astack[0]->ruleset.tticket)) {
 		warn("failed to create table %s in %s",
 		    tbl->pt_name, pf->astack[0]->name);
 		return (1);
 	}
 	return (0);
 }
 
 /*
  * Partition the flat ruleset into a list of distinct superblocks
  */
 int
 construct_superblocks(struct pfctl *pf, struct pf_opt_queue *opt_queue,
     struct superblocks *superblocks)
 {
 	struct superblock *block = NULL;
 	struct pf_opt_rule *por;
 	int i;
 
 	while (!TAILQ_EMPTY(opt_queue)) {
 		por = TAILQ_FIRST(opt_queue);
 		TAILQ_REMOVE(opt_queue, por, por_entry);
 		if (block == NULL || !superblock_inclusive(block, por)) {
 			if ((block = calloc(1, sizeof(*block))) == NULL) {
 				warn("calloc");
 				return (1);
 			}
 			TAILQ_INIT(&block->sb_rules);
 			for (i = 0; i < PF_SKIP_COUNT; i++)
 				TAILQ_INIT(&block->sb_skipsteps[i]);
 			TAILQ_INSERT_TAIL(superblocks, block, sb_entry);
 		}
 		TAILQ_INSERT_TAIL(&block->sb_rules, por, por_entry);
 	}
 
 	return (0);
 }
 
 
 /*
  * Compare two rule addresses
  */
 int
 addrs_equal(struct pf_rule_addr *a, struct pf_rule_addr *b)
 {
 	if (a->neg != b->neg)
 		return (0);
 	return (memcmp(&a->addr, &b->addr, sizeof(a->addr)) == 0);
 }
 
 
 /*
  * The addresses are not equal, but can we combine them into one table?
  */
 int
 addrs_combineable(struct pf_rule_addr *a, struct pf_rule_addr *b)
 {
 	if (a->addr.type != PF_ADDR_ADDRMASK ||
 	    b->addr.type != PF_ADDR_ADDRMASK)
 		return (0);
 	if (a->neg != b->neg || a->port_op != b->port_op ||
 	    a->port[0] != b->port[0] || a->port[1] != b->port[1])
 		return (0);
 	return (1);
 }
 
 
 /*
  * Are we allowed to combine these two rules
  */
 int
 rules_combineable(struct pf_rule *p1, struct pf_rule *p2)
 {
 	struct pf_rule a, b;
 
 	comparable_rule(&a, p1, COMBINED);
 	comparable_rule(&b, p2, COMBINED);
 	return (memcmp(&a, &b, sizeof(a)) == 0);
 }
 
 
 /*
  * Can a rule be included inside a superblock
  */
 int
 superblock_inclusive(struct superblock *block, struct pf_opt_rule *por)
 {
 	struct pf_rule a, b;
 	int i, j;
 
 	/* First check for hard breaks */
 	for (i = 0; i < sizeof(pf_rule_desc)/sizeof(*pf_rule_desc); i++) {
 		if (pf_rule_desc[i].prf_type == BARRIER) {
 			for (j = 0; j < pf_rule_desc[i].prf_size; j++)
 				if (((char *)&por->por_rule)[j +
 				    pf_rule_desc[i].prf_offset] != 0)
 					return (0);
 		}
 	}
 
 	/* per-rule src-track is also a hard break */
 	if (por->por_rule.rule_flag & PFRULE_RULESRCTRACK)
 		return (0);
 
 	/*
 	 * Have to handle interface groups separately.  Consider the following
 	 * rules:
 	 *	block on EXTIFS to any port 22
 	 *	pass  on em0 to any port 22
 	 * (where EXTIFS is an arbitrary interface group)
 	 * The optimizer may decide to re-order the pass rule in front of the
 	 * block rule.  But what if EXTIFS includes em0???  Such a reordering
 	 * would change the meaning of the ruleset.
 	 * We can't just lookup the EXTIFS group and check if em0 is a member
 	 * because the user is allowed to add interfaces to a group during
 	 * runtime.
 	 * Ergo interface groups become a defacto superblock break :-(
 	 */
 	if (interface_group(por->por_rule.ifname) ||
 	    interface_group(TAILQ_FIRST(&block->sb_rules)->por_rule.ifname)) {
 		if (strcasecmp(por->por_rule.ifname,
 		    TAILQ_FIRST(&block->sb_rules)->por_rule.ifname) != 0)
 			return (0);
 	}
 
 	comparable_rule(&a, &TAILQ_FIRST(&block->sb_rules)->por_rule, NOMERGE);
 	comparable_rule(&b, &por->por_rule, NOMERGE);
 	if (memcmp(&a, &b, sizeof(a)) == 0)
 		return (1);
 
 #ifdef OPT_DEBUG
 	for (i = 0; i < sizeof(por->por_rule); i++) {
 		int closest = -1;
 		if (((u_int8_t *)&a)[i] != ((u_int8_t *)&b)[i]) {
 			for (j = 0; j < sizeof(pf_rule_desc) /
 			    sizeof(*pf_rule_desc); j++) {
 				if (i >= pf_rule_desc[j].prf_offset &&
 				    i < pf_rule_desc[j].prf_offset +
 				    pf_rule_desc[j].prf_size) {
 					DEBUG("superblock break @ %d due to %s",
 					    por->por_rule.nr,
 					    pf_rule_desc[j].prf_name);
 					return (0);
 				}
 				if (i > pf_rule_desc[j].prf_offset) {
 					if (closest == -1 ||
 					    i-pf_rule_desc[j].prf_offset <
 					    i-pf_rule_desc[closest].prf_offset)
 						closest = j;
 				}
 			}
 
 			if (closest >= 0)
 				DEBUG("superblock break @ %d on %s+%xh",
 				    por->por_rule.nr,
 				    pf_rule_desc[closest].prf_name,
 				    i - pf_rule_desc[closest].prf_offset -
 				    pf_rule_desc[closest].prf_size);
 			else
 				DEBUG("superblock break @ %d on field @ %d",
 				    por->por_rule.nr, i);
 			return (0);
 		}
 	}
 #endif /* OPT_DEBUG */
 
 	return (0);
 }
 
 
 /*
  * Figure out if an interface name is an actual interface or actually a
  * group of interfaces.
  */
 int
 interface_group(const char *ifname)
 {
 	if (ifname == NULL || !ifname[0])
 		return (0);
 
 	/* Real interfaces must end in a number, interface groups do not */
 	if (isdigit(ifname[strlen(ifname) - 1]))
 		return (0);
 	else
 		return (1);
 }
 
 
 /*
  * Make a rule that can directly compared by memcmp()
  */
 void
 comparable_rule(struct pf_rule *dst, const struct pf_rule *src, int type)
 {
 	int i;
 	/*
 	 * To simplify the comparison, we just zero out the fields that are
 	 * allowed to be different and then do a simple memcmp()
 	 */
 	memcpy(dst, src, sizeof(*dst));
 	for (i = 0; i < sizeof(pf_rule_desc)/sizeof(*pf_rule_desc); i++)
 		if (pf_rule_desc[i].prf_type >= type) {
 #ifdef OPT_DEBUG
 			assert(pf_rule_desc[i].prf_type != NEVER ||
 			    *(((char *)dst) + pf_rule_desc[i].prf_offset) == 0);
 #endif /* OPT_DEBUG */
 			memset(((char *)dst) + pf_rule_desc[i].prf_offset, 0,
 			    pf_rule_desc[i].prf_size);
 		}
 }
 
 
 /*
  * Remove superset information from two rules so we can directly compare them
  * with memcmp()
  */
 void
 exclude_supersets(struct pf_rule *super, struct pf_rule *sub)
 {
 	if (super->ifname[0] == '\0')
 		memset(sub->ifname, 0, sizeof(sub->ifname));
 	if (super->direction == PF_INOUT)
 		sub->direction = PF_INOUT;
 	if ((super->proto == 0 || super->proto == sub->proto) &&
 	    super->flags == 0 && super->flagset == 0 && (sub->flags ||
 	    sub->flagset)) {
 		sub->flags = super->flags;
 		sub->flagset = super->flagset;
 	}
 	if (super->proto == 0)
 		sub->proto = 0;
 
 	if (super->src.port_op == 0) {
 		sub->src.port_op = 0;
 		sub->src.port[0] = 0;
 		sub->src.port[1] = 0;
 	}
 	if (super->dst.port_op == 0) {
 		sub->dst.port_op = 0;
 		sub->dst.port[0] = 0;
 		sub->dst.port[1] = 0;
 	}
 
 	if (super->src.addr.type == PF_ADDR_ADDRMASK && !super->src.neg &&
 	    !sub->src.neg && super->src.addr.v.a.mask.addr32[0] == 0 &&
 	    super->src.addr.v.a.mask.addr32[1] == 0 &&
 	    super->src.addr.v.a.mask.addr32[2] == 0 &&
 	    super->src.addr.v.a.mask.addr32[3] == 0)
 		memset(&sub->src.addr, 0, sizeof(sub->src.addr));
 	else if (super->src.addr.type == PF_ADDR_ADDRMASK &&
 	    sub->src.addr.type == PF_ADDR_ADDRMASK &&
 	    super->src.neg == sub->src.neg &&
 	    super->af == sub->af &&
 	    unmask(&super->src.addr.v.a.mask, super->af) <
 	    unmask(&sub->src.addr.v.a.mask, sub->af) &&
 	    super->src.addr.v.a.addr.addr32[0] ==
 	    (sub->src.addr.v.a.addr.addr32[0] &
 	    super->src.addr.v.a.mask.addr32[0]) &&
 	    super->src.addr.v.a.addr.addr32[1] ==
 	    (sub->src.addr.v.a.addr.addr32[1] &
 	    super->src.addr.v.a.mask.addr32[1]) &&
 	    super->src.addr.v.a.addr.addr32[2] ==
 	    (sub->src.addr.v.a.addr.addr32[2] &
 	    super->src.addr.v.a.mask.addr32[2]) &&
 	    super->src.addr.v.a.addr.addr32[3] ==
 	    (sub->src.addr.v.a.addr.addr32[3] &
 	    super->src.addr.v.a.mask.addr32[3])) {
 		/* sub->src.addr is a subset of super->src.addr/mask */
 		memcpy(&sub->src.addr, &super->src.addr, sizeof(sub->src.addr));
 	}
 
 	if (super->dst.addr.type == PF_ADDR_ADDRMASK && !super->dst.neg &&
 	    !sub->dst.neg && super->dst.addr.v.a.mask.addr32[0] == 0 &&
 	    super->dst.addr.v.a.mask.addr32[1] == 0 &&
 	    super->dst.addr.v.a.mask.addr32[2] == 0 &&
 	    super->dst.addr.v.a.mask.addr32[3] == 0)
 		memset(&sub->dst.addr, 0, sizeof(sub->dst.addr));
 	else if (super->dst.addr.type == PF_ADDR_ADDRMASK &&
 	    sub->dst.addr.type == PF_ADDR_ADDRMASK &&
 	    super->dst.neg == sub->dst.neg &&
 	    super->af == sub->af &&
 	    unmask(&super->dst.addr.v.a.mask, super->af) <
 	    unmask(&sub->dst.addr.v.a.mask, sub->af) &&
 	    super->dst.addr.v.a.addr.addr32[0] ==
 	    (sub->dst.addr.v.a.addr.addr32[0] &
 	    super->dst.addr.v.a.mask.addr32[0]) &&
 	    super->dst.addr.v.a.addr.addr32[1] ==
 	    (sub->dst.addr.v.a.addr.addr32[1] &
 	    super->dst.addr.v.a.mask.addr32[1]) &&
 	    super->dst.addr.v.a.addr.addr32[2] ==
 	    (sub->dst.addr.v.a.addr.addr32[2] &
 	    super->dst.addr.v.a.mask.addr32[2]) &&
 	    super->dst.addr.v.a.addr.addr32[3] ==
 	    (sub->dst.addr.v.a.addr.addr32[3] &
 	    super->dst.addr.v.a.mask.addr32[3])) {
 		/* sub->dst.addr is a subset of super->dst.addr/mask */
 		memcpy(&sub->dst.addr, &super->dst.addr, sizeof(sub->dst.addr));
 	}
 
 	if (super->af == 0)
 		sub->af = 0;
 }
 
 
 void
 superblock_free(struct pfctl *pf, struct superblock *block)
 {
 	struct pf_opt_rule *por;
 	while ((por = TAILQ_FIRST(&block->sb_rules))) {
 		TAILQ_REMOVE(&block->sb_rules, por, por_entry);
 		if (por->por_src_tbl) {
 			if (por->por_src_tbl->pt_buf) {
 				pfr_buf_clear(por->por_src_tbl->pt_buf);
 				free(por->por_src_tbl->pt_buf);
 			}
 			free(por->por_src_tbl);
 		}
 		if (por->por_dst_tbl) {
 			if (por->por_dst_tbl->pt_buf) {
 				pfr_buf_clear(por->por_dst_tbl->pt_buf);
 				free(por->por_dst_tbl->pt_buf);
 			}
 			free(por->por_dst_tbl);
 		}
 		free(por);
 	}
 	if (block->sb_profiled_block)
 		superblock_free(pf, block->sb_profiled_block);
 	free(block);
 }
 
Index: user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_osfp.c
===================================================================
--- user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_osfp.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_osfp.c	(revision 303775)
@@ -1,1108 +1,1111 @@
 /*	$OpenBSD: pfctl_osfp.c,v 1.14 2006/04/08 02:13:14 ray Exp $ */
 
 /*
  * Copyright (c) 2003 Mike Frantzen <frantzen@openbsd.org>
  *
  * Permission to use, copy, modify, and distribute this software for any
  * purpose with or without fee is hereby granted, provided that the above
  * copyright notice and this permission notice appear in all copies.
  *
  * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
  * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
  * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
  * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
  * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
  * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
  * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  */
 
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
 #include <sys/types.h>
 #include <sys/ioctl.h>
 #include <sys/socket.h>
 
 #include <net/if.h>
 #include <net/pfvar.h>
 
 #include <netinet/in_systm.h>
 #include <netinet/ip.h>
 #include <netinet/ip6.h>
 
 #include <ctype.h>
 #include <err.h>
 #include <errno.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 
 #include "pfctl_parser.h"
 #include "pfctl.h"
 
 #ifndef MIN
 # define MIN(a,b)	(((a) < (b)) ? (a) : (b))
 #endif /* MIN */
 #ifndef MAX
 # define MAX(a,b)	(((a) > (b)) ? (a) : (b))
 #endif /* MAX */
 
 
 #if 0
 # define DEBUG(fp, str, v...) \
 	fprintf(stderr, "%s:%s:%s " str "\n", (fp)->fp_os.fp_class_nm, \
 	    (fp)->fp_os.fp_version_nm, (fp)->fp_os.fp_subtype_nm , ## v);
 #else
 # define DEBUG(fp, str, v...) ((void)0)
 #endif
 
 
 struct name_entry;
 LIST_HEAD(name_list, name_entry);
 struct name_entry {
 	LIST_ENTRY(name_entry)	nm_entry;
 	int			nm_num;
 	char			nm_name[PF_OSFP_LEN];
 
 	struct name_list	nm_sublist;
 	int			nm_sublist_num;
 };
-struct name_list classes = LIST_HEAD_INITIALIZER(&classes);
-int class_count;
-int fingerprint_count;
+static struct name_list classes = LIST_HEAD_INITIALIZER(&classes);
+static int class_count;
+static int fingerprint_count;
 
 void			 add_fingerprint(int, int, struct pf_osfp_ioctl *);
 struct name_entry	*fingerprint_name_entry(struct name_list *, char *);
 void			 pfctl_flush_my_fingerprints(struct name_list *);
 char			*get_field(char **, size_t *, int *);
 int			 get_int(char **, size_t *, int *, int *, const char *,
 			     int, int, const char *, int);
 int			 get_str(char **, size_t *, char **, const char *, int,
 			     const char *, int);
 int			 get_tcpopts(const char *, int, const char *,
 			    pf_tcpopts_t *, int *, int *, int *, int *, int *,
 			    int *);
 void			 import_fingerprint(struct pf_osfp_ioctl *);
 const char		*print_ioctl(struct pf_osfp_ioctl *);
 void			 print_name_list(int, struct name_list *, const char *);
 void			 sort_name_list(int, struct name_list *);
 struct name_entry	*lookup_name_list(struct name_list *, const char *);
 
 /* Load fingerprints from a file */
 int
 pfctl_file_fingerprints(int dev, int opts, const char *fp_filename)
 {
 	FILE *in;
 	char *line;
 	size_t len;
 	int i, lineno = 0;
 	int window, w_mod, ttl, df, psize, p_mod, mss, mss_mod, wscale,
 	    wscale_mod, optcnt, ts0;
 	pf_tcpopts_t packed_tcpopts;
 	char *class, *version, *subtype, *desc, *tcpopts;
 	struct pf_osfp_ioctl fp;
 
 	pfctl_flush_my_fingerprints(&classes);
 
 	if ((in = pfctl_fopen(fp_filename, "r")) == NULL) {
 		warn("%s", fp_filename);
 		return (1);
 	}
 	class = version = subtype = desc = tcpopts = NULL;
 
 	if ((opts & PF_OPT_NOACTION) == 0)
 		pfctl_clear_fingerprints(dev, opts);
 
 	while ((line = fgetln(in, &len)) != NULL) {
 		lineno++;
 		if (class)
 			free(class);
 		if (version)
 			free(version);
 		if (subtype)
 			free(subtype);
 		if (desc)
 			free(desc);
 		if (tcpopts)
 			free(tcpopts);
 		class = version = subtype = desc = tcpopts = NULL;
 		memset(&fp, 0, sizeof(fp));
 
 		/* Chop off comment */
 		for (i = 0; i < len; i++)
 			if (line[i] == '#') {
 				len = i;
 				break;
 			}
 		/* Chop off whitespace */
 		while (len > 0 && isspace(line[len - 1]))
 			len--;
 		while (len > 0 && isspace(line[0])) {
 			len--;
 			line++;
 		}
 		if (len == 0)
 			continue;
 
 #define T_DC	0x01	/* Allow don't care */
 #define T_MSS	0x02	/* Allow MSS multiple */
 #define T_MTU	0x04	/* Allow MTU multiple */
 #define T_MOD	0x08	/* Allow modulus */
 
 #define GET_INT(v, mod, n, ty, mx) \
 	get_int(&line, &len, &v, mod, n, ty, mx, fp_filename, lineno)
 #define GET_STR(v, n, mn) \
 	get_str(&line, &len, &v, n, mn, fp_filename, lineno)
 
 		if (GET_INT(window, &w_mod, "window size", T_DC|T_MSS|T_MTU|
 		    T_MOD, 0xffff) ||
 		    GET_INT(ttl, NULL, "ttl", 0, 0xff) ||
 		    GET_INT(df, NULL, "don't fragment frag", 0, 1) ||
 		    GET_INT(psize, &p_mod, "overall packet size", T_MOD|T_DC,
 		    8192) ||
 		    GET_STR(tcpopts, "TCP Options", 1) ||
 		    GET_STR(class, "OS class", 1) ||
 		    GET_STR(version, "OS version", 0) ||
 		    GET_STR(subtype, "OS subtype", 0) ||
 		    GET_STR(desc, "OS description", 2))
 			continue;
 		if (get_tcpopts(fp_filename, lineno, tcpopts, &packed_tcpopts,
 		    &optcnt, &mss, &mss_mod, &wscale, &wscale_mod, &ts0))
 			continue;
 		if (len != 0) {
 			fprintf(stderr, "%s:%d excess field\n", fp_filename,
 			    lineno);
 			continue;
 		}
 
 		fp.fp_ttl = ttl;
 		if (df)
 			fp.fp_flags |= PF_OSFP_DF;
 		switch (w_mod) {
 		case 0:
 			break;
 		case T_DC:
 			fp.fp_flags |= PF_OSFP_WSIZE_DC;
 			break;
 		case T_MSS:
 			fp.fp_flags |= PF_OSFP_WSIZE_MSS;
 			break;
 		case T_MTU:
 			fp.fp_flags |= PF_OSFP_WSIZE_MTU;
 			break;
 		case T_MOD:
 			fp.fp_flags |= PF_OSFP_WSIZE_MOD;
 			break;
 		}
 		fp.fp_wsize = window;
 
 		switch (p_mod) {
 		case T_DC:
 			fp.fp_flags |= PF_OSFP_PSIZE_DC;
 			break;
 		case T_MOD:
 			fp.fp_flags |= PF_OSFP_PSIZE_MOD;
 		}
 		fp.fp_psize = psize;
 
 
 		switch (wscale_mod) {
 		case T_DC:
 			fp.fp_flags |= PF_OSFP_WSCALE_DC;
 			break;
 		case T_MOD:
 			fp.fp_flags |= PF_OSFP_WSCALE_MOD;
 		}
 		fp.fp_wscale = wscale;
 
 		switch (mss_mod) {
 		case T_DC:
 			fp.fp_flags |= PF_OSFP_MSS_DC;
 			break;
 		case T_MOD:
 			fp.fp_flags |= PF_OSFP_MSS_MOD;
 			break;
 		}
 		fp.fp_mss = mss;
 
 		fp.fp_tcpopts = packed_tcpopts;
 		fp.fp_optcnt = optcnt;
 		if (ts0)
 			fp.fp_flags |= PF_OSFP_TS0;
 
 		if (class[0] == '@')
 			fp.fp_os.fp_enflags |= PF_OSFP_GENERIC;
 		if (class[0] == '*')
 			fp.fp_os.fp_enflags |= PF_OSFP_NODETAIL;
 
 		if (class[0] == '@' || class[0] == '*')
 			strlcpy(fp.fp_os.fp_class_nm, class + 1,
 			    sizeof(fp.fp_os.fp_class_nm));
 		else
 			strlcpy(fp.fp_os.fp_class_nm, class,
 			    sizeof(fp.fp_os.fp_class_nm));
 		strlcpy(fp.fp_os.fp_version_nm, version,
 		    sizeof(fp.fp_os.fp_version_nm));
 		strlcpy(fp.fp_os.fp_subtype_nm, subtype,
 		    sizeof(fp.fp_os.fp_subtype_nm));
 
 		add_fingerprint(dev, opts, &fp);
 
 		fp.fp_flags |= (PF_OSFP_DF | PF_OSFP_INET6);
 		fp.fp_psize += sizeof(struct ip6_hdr) - sizeof(struct ip);
 		add_fingerprint(dev, opts, &fp);
 	}
 
 	if (class)
 		free(class);
 	if (version)
 		free(version);
 	if (subtype)
 		free(subtype);
 	if (desc)
 		free(desc);
 	if (tcpopts)
 		free(tcpopts);
 
 	fclose(in);
 
 	if (opts & PF_OPT_VERBOSE2)
 		printf("Loaded %d passive OS fingerprints\n",
 		    fingerprint_count);
 	return (0);
 }
 
 /* flush the kernel's fingerprints */
 void
 pfctl_clear_fingerprints(int dev, int opts)
 {
 	if (ioctl(dev, DIOCOSFPFLUSH))
 		err(1, "DIOCOSFPFLUSH");
 }
 
 /* flush pfctl's view of the fingerprints */
 void
 pfctl_flush_my_fingerprints(struct name_list *list)
 {
 	struct name_entry *nm;
 
 	while ((nm = LIST_FIRST(list)) != NULL) {
 		LIST_REMOVE(nm, nm_entry);
 		pfctl_flush_my_fingerprints(&nm->nm_sublist);
 		free(nm);
 	}
 	fingerprint_count = 0;
 	class_count = 0;
 }
 
 /* Fetch the active fingerprints from the kernel */
 int
 pfctl_load_fingerprints(int dev, int opts)
 {
 	struct pf_osfp_ioctl io;
 	int i;
 
 	pfctl_flush_my_fingerprints(&classes);
 
 	for (i = 0; i >= 0; i++) {
 		memset(&io, 0, sizeof(io));
 		io.fp_getnum = i;
 		if (ioctl(dev, DIOCOSFPGET, &io)) {
 			if (errno == EBUSY)
 				break;
 			warn("DIOCOSFPGET");
 			return (1);
 		}
 		import_fingerprint(&io);
 	}
 	return (0);
 }
 
 /* List the fingerprints */
 void
 pfctl_show_fingerprints(int opts)
 {
 	if (LIST_FIRST(&classes) != NULL) {
 		if (opts & PF_OPT_SHOWALL) {
 			pfctl_print_title("OS FINGERPRINTS:");
 			printf("%u fingerprints loaded\n", fingerprint_count);
 		} else {
 			printf("Class\tVersion\tSubtype(subversion)\n");
 			printf("-----\t-------\t-------------------\n");
 			sort_name_list(opts, &classes);
 			print_name_list(opts, &classes, "");
 		}
 	}
 }
 
 /* Lookup a fingerprint */
 pf_osfp_t
 pfctl_get_fingerprint(const char *name)
 {
 	struct name_entry *nm, *class_nm, *version_nm, *subtype_nm;
 	pf_osfp_t ret = PF_OSFP_NOMATCH;
 	int class, version, subtype;
 	int unp_class, unp_version, unp_subtype;
 	int wr_len, version_len, subtype_len;
 	char *ptr, *wr_name;
 
 	if (strcasecmp(name, "unknown") == 0)
 		return (PF_OSFP_UNKNOWN);
 
 	/* Try most likely no version and no subtype */
 	if ((nm = lookup_name_list(&classes, name))) {
 		class = nm->nm_num;
 		version = PF_OSFP_ANY;
 		subtype = PF_OSFP_ANY;
 		goto found;
 	} else {
 
 		/* Chop it up into class/version/subtype */
 
 		if ((wr_name = strdup(name)) == NULL)
 			err(1, "malloc");
 		if ((ptr = strchr(wr_name, ' ')) == NULL) {
 			free(wr_name);
 			return (PF_OSFP_NOMATCH);
 		}
 		*ptr++ = '\0';
 
 		/* The class is easy to find since it is delimited by a space */
 		if ((class_nm = lookup_name_list(&classes, wr_name)) == NULL) {
 			free(wr_name);
 			return (PF_OSFP_NOMATCH);
 		}
 		class = class_nm->nm_num;
 
 		/* Try no subtype */
 		if ((version_nm = lookup_name_list(&class_nm->nm_sublist, ptr)))
 		{
 			version = version_nm->nm_num;
 			subtype = PF_OSFP_ANY;
 			free(wr_name);
 			goto found;
 		}
 
 
 		/*
 		 * There must be a version and a subtype.
 		 * We'll do some fuzzy matching to pick up things like:
 		 *   Linux 2.2.14 (version=2.2 subtype=14)
 		 *   FreeBSD 4.0-STABLE (version=4.0 subtype=STABLE)
 		 *   Windows 2000 SP2	(version=2000 subtype=SP2)
 		 */
 #define CONNECTOR(x)	((x) == '.' || (x) == ' ' || (x) == '\t' || (x) == '-')
 		wr_len = strlen(ptr);
 		LIST_FOREACH(version_nm, &class_nm->nm_sublist, nm_entry) {
 			version_len = strlen(version_nm->nm_name);
 			if (wr_len < version_len + 2 ||
 			    !CONNECTOR(ptr[version_len]))
 				continue;
 			/* first part of the string must be version */
 			if (strncasecmp(ptr, version_nm->nm_name,
 			    version_len))
 				continue;
 
 			LIST_FOREACH(subtype_nm, &version_nm->nm_sublist,
 			    nm_entry) {
 				subtype_len = strlen(subtype_nm->nm_name);
 				if (wr_len != version_len + subtype_len + 1)
 					continue;
 
 				/* last part of the string must be subtype */
 				if (strcasecmp(&ptr[version_len+1],
 				    subtype_nm->nm_name) != 0)
 					continue;
 
 				/* Found it!! */
 				version = version_nm->nm_num;
 				subtype = subtype_nm->nm_num;
 				free(wr_name);
 				goto found;
 			}
 		}
 
 		free(wr_name);
 		return (PF_OSFP_NOMATCH);
 	}
 
 found:
 	PF_OSFP_PACK(ret, class, version, subtype);
 	if (ret != PF_OSFP_NOMATCH) {
 		PF_OSFP_UNPACK(ret, unp_class, unp_version, unp_subtype);
 		if (class != unp_class) {
 			fprintf(stderr, "warning: fingerprint table overflowed "
 			    "classes\n");
 			return (PF_OSFP_NOMATCH);
 		}
 		if (version != unp_version) {
 			fprintf(stderr, "warning: fingerprint table overflowed "
 			    "versions\n");
 			return (PF_OSFP_NOMATCH);
 		}
 		if (subtype != unp_subtype) {
 			fprintf(stderr, "warning: fingerprint table overflowed "
 			    "subtypes\n");
 			return (PF_OSFP_NOMATCH);
 		}
 	}
 	if (ret == PF_OSFP_ANY) {
 		/* should never happen */
 		fprintf(stderr, "warning: fingerprint packed to 'any'\n");
 		return (PF_OSFP_NOMATCH);
 	}
 
 	return (ret);
 }
 
 /* Lookup a fingerprint name by ID */
 char *
 pfctl_lookup_fingerprint(pf_osfp_t fp, char *buf, size_t len)
 {
 	int class, version, subtype;
 	struct name_list *list;
 	struct name_entry *nm;
 
 	char *class_name, *version_name, *subtype_name;
 	class_name = version_name = subtype_name = NULL;
 
 	if (fp == PF_OSFP_UNKNOWN) {
 		strlcpy(buf, "unknown", len);
 		return (buf);
 	}
 	if (fp == PF_OSFP_ANY) {
 		strlcpy(buf, "any", len);
 		return (buf);
 	}
 
 	PF_OSFP_UNPACK(fp, class, version, subtype);
 	if (class >= (1 << _FP_CLASS_BITS) ||
 	    version >= (1 << _FP_VERSION_BITS) ||
 	    subtype >= (1 << _FP_SUBTYPE_BITS)) {
 		warnx("PF_OSFP_UNPACK(0x%x) failed!!", fp);
 		strlcpy(buf, "nomatch", len);
 		return (buf);
 	}
 
 	LIST_FOREACH(nm, &classes, nm_entry) {
 		if (nm->nm_num == class) {
 			class_name = nm->nm_name;
 			if (version == PF_OSFP_ANY)
 				goto found;
 			list = &nm->nm_sublist;
 			LIST_FOREACH(nm, list, nm_entry) {
 				if (nm->nm_num == version) {
 					version_name = nm->nm_name;
 					if (subtype == PF_OSFP_ANY)
 						goto found;
 					list = &nm->nm_sublist;
 					LIST_FOREACH(nm, list, nm_entry) {
 						if (nm->nm_num == subtype) {
 							subtype_name =
 							    nm->nm_name;
 							goto found;
 						}
 					} /* foreach subtype */
 					strlcpy(buf, "nomatch", len);
 					return (buf);
 				}
 			} /* foreach version */
 			strlcpy(buf, "nomatch", len);
 			return (buf);
 		}
 	} /* foreach class */
 
 	strlcpy(buf, "nomatch", len);
 	return (buf);
 
 found:
 	snprintf(buf, len, "%s", class_name);
 	if (version_name) {
 		strlcat(buf, " ", len);
 		strlcat(buf, version_name, len);
 		if (subtype_name) {
 			if (strchr(version_name, ' '))
 				strlcat(buf, " ", len);
 			else if (strchr(version_name, '.') &&
 			    isdigit(*subtype_name))
 				strlcat(buf, ".", len);
 			else
 				strlcat(buf, " ", len);
 			strlcat(buf, subtype_name, len);
 		}
 	}
 	return (buf);
 }
 
 /* lookup a name in a list */
 struct name_entry *
 lookup_name_list(struct name_list *list, const char *name)
 {
 	struct name_entry *nm;
 	LIST_FOREACH(nm, list, nm_entry)
 		if (strcasecmp(name, nm->nm_name) == 0)
 			return (nm);
 
 	return (NULL);
 }
 
 
 void
 add_fingerprint(int dev, int opts, struct pf_osfp_ioctl *fp)
 {
 	struct pf_osfp_ioctl fptmp;
 	struct name_entry *nm_class, *nm_version, *nm_subtype;
 	int class, version, subtype;
 
 /* We expand #-# or #.#-#.# version/subtypes into multiple fingerprints */
 #define EXPAND(field) do {						\
 	int _dot = -1, _start = -1, _end = -1, _i = 0;			\
 	/* pick major version out of #.# */				\
 	if (isdigit(fp->field[_i]) && fp->field[_i+1] == '.') {		\
 		_dot = fp->field[_i] - '0';				\
 		_i += 2;						\
 	}								\
 	if (isdigit(fp->field[_i]))					\
 		_start = fp->field[_i++] - '0';				\
 	else								\
 		break;							\
 	if (isdigit(fp->field[_i]))					\
 		_start = (_start * 10) + fp->field[_i++] - '0';		\
 	if (fp->field[_i++] != '-')					\
 		break;							\
 	if (isdigit(fp->field[_i]) && fp->field[_i+1] == '.' &&		\
 	    fp->field[_i] - '0' == _dot)				\
 		_i += 2;						\
 	else if (_dot != -1)						\
 		break;							\
 	if (isdigit(fp->field[_i]))					\
 		_end = fp->field[_i++] - '0';				\
 	else								\
 		break;							\
 	if (isdigit(fp->field[_i]))					\
 		_end = (_end * 10) + fp->field[_i++] - '0';		\
 	if (isdigit(fp->field[_i]))					\
 		_end = (_end * 10) + fp->field[_i++] - '0';		\
 	if (fp->field[_i] != '\0')					\
 		break;							\
 	memcpy(&fptmp, fp, sizeof(fptmp));				\
 	for (;_start <= _end; _start++) {				\
 		memset(fptmp.field, 0, sizeof(fptmp.field));		\
 		fptmp.fp_os.fp_enflags |= PF_OSFP_EXPANDED;		\
 		if (_dot == -1)						\
 			snprintf(fptmp.field, sizeof(fptmp.field),	\
 			    "%d", _start);				\
 		    else						\
 			snprintf(fptmp.field, sizeof(fptmp.field),	\
 			    "%d.%d", _dot, _start);			\
 		add_fingerprint(dev, opts, &fptmp);			\
 	}								\
 } while(0)
 
 	/* We allow "#-#" as a version or subtype and we'll expand it */
 	EXPAND(fp_os.fp_version_nm);
 	EXPAND(fp_os.fp_subtype_nm);
 
 	if (strcasecmp(fp->fp_os.fp_class_nm, "nomatch") == 0)
 		errx(1, "fingerprint class \"nomatch\" is reserved");
 
 	version = PF_OSFP_ANY;
 	subtype = PF_OSFP_ANY;
 
 	nm_class = fingerprint_name_entry(&classes, fp->fp_os.fp_class_nm);
 	if (nm_class->nm_num == 0)
 		nm_class->nm_num = ++class_count;
 	class = nm_class->nm_num;
 
 	nm_version = fingerprint_name_entry(&nm_class->nm_sublist,
 	    fp->fp_os.fp_version_nm);
 	if (nm_version) {
 		if (nm_version->nm_num == 0)
 			nm_version->nm_num = ++nm_class->nm_sublist_num;
 		version = nm_version->nm_num;
 		nm_subtype = fingerprint_name_entry(&nm_version->nm_sublist,
 		    fp->fp_os.fp_subtype_nm);
 		if (nm_subtype) {
 			if (nm_subtype->nm_num == 0)
 				nm_subtype->nm_num =
 				    ++nm_version->nm_sublist_num;
 			subtype = nm_subtype->nm_num;
 		}
 	}
 
 
 	DEBUG(fp, "\tsignature %d:%d:%d %s", class, version, subtype,
 	    print_ioctl(fp));
 
 	PF_OSFP_PACK(fp->fp_os.fp_os, class, version, subtype);
 	fingerprint_count++;
 
 #ifdef FAKE_PF_KERNEL
 	/* Linked to the sys/net/pf_osfp.c.  Call pf_osfp_add() */
 	if ((errno = pf_osfp_add(fp)))
 #else
 	if ((opts & PF_OPT_NOACTION) == 0 && ioctl(dev, DIOCOSFPADD, fp))
 #endif /* FAKE_PF_KERNEL */
 	{
 		if (errno == EEXIST) {
 			warn("Duplicate signature for %s %s %s",
 				fp->fp_os.fp_class_nm,
 				fp->fp_os.fp_version_nm,
 				fp->fp_os.fp_subtype_nm);
 
 		} else {
 			err(1, "DIOCOSFPADD");
 		}
 	}
 }
 
 /* import a fingerprint from the kernel */
 void
 import_fingerprint(struct pf_osfp_ioctl *fp)
 {
 	struct name_entry *nm_class, *nm_version, *nm_subtype;
 	int class, version, subtype;
 
 	PF_OSFP_UNPACK(fp->fp_os.fp_os, class, version, subtype);
 
 	nm_class = fingerprint_name_entry(&classes, fp->fp_os.fp_class_nm);
 	if (nm_class->nm_num == 0) {
 		nm_class->nm_num = class;
 		class_count = MAX(class_count, class);
 	}
 
 	nm_version = fingerprint_name_entry(&nm_class->nm_sublist,
 	    fp->fp_os.fp_version_nm);
 	if (nm_version) {
 		if (nm_version->nm_num == 0) {
 			nm_version->nm_num = version;
 			nm_class->nm_sublist_num = MAX(nm_class->nm_sublist_num,
 			    version);
 		}
 		nm_subtype = fingerprint_name_entry(&nm_version->nm_sublist,
 		    fp->fp_os.fp_subtype_nm);
 		if (nm_subtype) {
 			if (nm_subtype->nm_num == 0) {
 				nm_subtype->nm_num = subtype;
 				nm_version->nm_sublist_num =
 				    MAX(nm_version->nm_sublist_num, subtype);
 			}
 		}
 	}
 
 
 	fingerprint_count++;
 	DEBUG(fp, "import signature %d:%d:%d", class, version, subtype);
 }
 
 /* Find an entry for a fingerprints class/version/subtype */
 struct name_entry *
 fingerprint_name_entry(struct name_list *list, char *name)
 {
 	struct name_entry *nm_entry;
 
 	if (name == NULL || strlen(name) == 0)
 		return (NULL);
 
 	LIST_FOREACH(nm_entry, list, nm_entry) {
 		if (strcasecmp(nm_entry->nm_name, name) == 0) {
 			/* We'll move this to the front of the list later */
 			LIST_REMOVE(nm_entry, nm_entry);
 			break;
 		}
 	}
 	if (nm_entry == NULL) {
 		nm_entry = calloc(1, sizeof(*nm_entry));
 		if (nm_entry == NULL)
 			err(1, "calloc");
 		LIST_INIT(&nm_entry->nm_sublist);
 		strlcpy(nm_entry->nm_name, name, sizeof(nm_entry->nm_name));
 	}
 	LIST_INSERT_HEAD(list, nm_entry, nm_entry);
 	return (nm_entry);
 }
 
 
 void
 print_name_list(int opts, struct name_list *nml, const char *prefix)
 {
 	char newprefix[32];
 	struct name_entry *nm;
 
 	LIST_FOREACH(nm, nml, nm_entry) {
 		snprintf(newprefix, sizeof(newprefix), "%s%s\t", prefix,
 		    nm->nm_name);
 		printf("%s\n", newprefix);
 		print_name_list(opts, &nm->nm_sublist, newprefix);
 	}
 }
 
 void
 sort_name_list(int opts, struct name_list *nml)
 {
 	struct name_list new;
 	struct name_entry *nm, *nmsearch, *nmlast;
 
 	/* yes yes, it's a very slow sort.  so sue me */
 
 	LIST_INIT(&new);
 
 	while ((nm = LIST_FIRST(nml)) != NULL) {
 		LIST_REMOVE(nm, nm_entry);
 		nmlast = NULL;
 		LIST_FOREACH(nmsearch, &new, nm_entry) {
 			if (strcasecmp(nmsearch->nm_name, nm->nm_name) > 0) {
 				LIST_INSERT_BEFORE(nmsearch, nm, nm_entry);
 				break;
 			}
 			nmlast = nmsearch;
 		}
 		if (nmsearch == NULL) {
 			if (nmlast)
 				LIST_INSERT_AFTER(nmlast, nm, nm_entry);
 			else
 				LIST_INSERT_HEAD(&new, nm, nm_entry);
 		}
 
 		sort_name_list(opts, &nm->nm_sublist);
 	}
 	nmlast = NULL;
 	while ((nm = LIST_FIRST(&new)) != NULL) {
 		LIST_REMOVE(nm, nm_entry);
 		if (nmlast == NULL)
 			LIST_INSERT_HEAD(nml, nm, nm_entry);
 		else
 			LIST_INSERT_AFTER(nmlast, nm, nm_entry);
 		nmlast = nm;
 	}
 }
 
 /* parse the next integer in a formatted config file line */
 int
 get_int(char **line, size_t *len, int *var, int *mod,
     const char *name, int flags, int max, const char *filename, int lineno)
 {
 	int fieldlen, i;
 	char *field;
 	long val = 0;
 
 	if (mod)
 		*mod = 0;
 	*var = 0;
 
 	field = get_field(line, len, &fieldlen);
 	if (field == NULL)
 		return (1);
 	if (fieldlen == 0) {
 		fprintf(stderr, "%s:%d empty %s\n", filename, lineno, name);
 		return (1);
 	}
 
 	i = 0;
 	if ((*field == '%' || *field == 'S' || *field == 'T' || *field == '*')
 	    && fieldlen >= 1) {
 		switch (*field) {
 		case 'S':
 			if (mod && (flags & T_MSS))
 				*mod = T_MSS;
 			if (fieldlen == 1)
 				return (0);
 			break;
 		case 'T':
 			if (mod && (flags & T_MTU))
 				*mod = T_MTU;
 			if (fieldlen == 1)
 				return (0);
 			break;
 		case '*':
 			if (fieldlen != 1) {
 				fprintf(stderr, "%s:%d long '%c' %s\n",
 				    filename, lineno, *field, name);
 				return (1);
 			}
 			if (mod && (flags & T_DC)) {
 				*mod = T_DC;
 				return (0);
 			}
 		case '%':
 			if (mod && (flags & T_MOD))
 				*mod = T_MOD;
 			if (fieldlen == 1) {
 				fprintf(stderr, "%s:%d modulus %s must have a "
 				    "value\n", filename, lineno, name);
 				return (1);
 			}
 			break;
 		}
 		if (mod == NULL || *mod == 0) {
 			fprintf(stderr, "%s:%d does not allow %c' %s\n",
 			    filename, lineno, *field, name);
 			return (1);
 		}
 		i++;
 	}
 
 	for (; i < fieldlen; i++) {
 		if (field[i] < '0' || field[i] > '9') {
 			fprintf(stderr, "%s:%d non-digit character in %s\n",
 			    filename, lineno, name);
 			return (1);
 		}
 		val = val * 10 + field[i] - '0';
 		if (val < 0) {
 			fprintf(stderr, "%s:%d %s overflowed\n", filename,
 			    lineno, name);
 			return (1);
 		}
 	}
 
 	if (val > max) {
 		fprintf(stderr, "%s:%d %s value %ld > %d\n", filename, lineno,
 		    name, val, max);
 		return (1);
 	}
 	*var = (int)val;
 
 	return (0);
 }
 
 /* parse the next string in a formatted config file line */
 int
 get_str(char **line, size_t *len, char **v, const char *name, int minlen,
     const char *filename, int lineno)
 {
 	int fieldlen;
 	char *ptr;
 
 	ptr = get_field(line, len, &fieldlen);
 	if (ptr == NULL)
 		return (1);
 	if (fieldlen < minlen) {
 		fprintf(stderr, "%s:%d too short %s\n", filename, lineno, name);
 		return (1);
 	}
 	if ((*v = malloc(fieldlen + 1)) == NULL) {
 		perror("malloc()");
 		return (1);
 	}
 	memcpy(*v, ptr, fieldlen);
 	(*v)[fieldlen] = '\0';
 
 	return (0);
 }
 
 /* Parse out the TCP opts */
 int
 get_tcpopts(const char *filename, int lineno, const char *tcpopts,
     pf_tcpopts_t *packed, int *optcnt, int *mss, int *mss_mod, int *wscale,
     int *wscale_mod, int *ts0)
 {
 	int i, opt;
 
 	*packed = 0;
 	*optcnt = 0;
 	*wscale = 0;
 	*wscale_mod = T_DC;
 	*mss = 0;
 	*mss_mod = T_DC;
 	*ts0 = 0;
 	if (strcmp(tcpopts, ".") == 0)
 		return (0);
 
 	for (i = 0; tcpopts[i] && *optcnt < PF_OSFP_MAX_OPTS;) {
 		switch ((opt = toupper(tcpopts[i++]))) {
 		case 'N':	/* FALLTHROUGH */
 		case 'S':
 			*packed = (*packed << PF_OSFP_TCPOPT_BITS) |
 			    (opt == 'N' ? PF_OSFP_TCPOPT_NOP :
 			    PF_OSFP_TCPOPT_SACK);
 			break;
 		case 'W':	/* FALLTHROUGH */
 		case 'M': {
 			int *this_mod, *this;
 
 			if (opt == 'W') {
 				this = wscale;
 				this_mod = wscale_mod;
 			} else {
 				this = mss;
 				this_mod = mss_mod;
 			}
 			*this = 0;
 			*this_mod = 0;
 
 			*packed = (*packed << PF_OSFP_TCPOPT_BITS) |
 			    (opt == 'W' ? PF_OSFP_TCPOPT_WSCALE :
 			    PF_OSFP_TCPOPT_MSS);
 			if (tcpopts[i] == '*' && (tcpopts[i + 1] == '\0' ||
 			    tcpopts[i + 1] == ',')) {
 				*this_mod = T_DC;
 				i++;
 				break;
 			}
 
 			if (tcpopts[i] == '%') {
 				*this_mod = T_MOD;
 				i++;
 			}
 			do {
 				if (!isdigit(tcpopts[i])) {
 					fprintf(stderr, "%s:%d unknown "
 					    "character '%c' in %c TCP opt\n",
 					    filename, lineno, tcpopts[i], opt);
 					return (1);
 				}
 				*this = (*this * 10) + tcpopts[i++] - '0';
 			} while(tcpopts[i] != ',' && tcpopts[i] != '\0');
 			break;
 		}
 		case 'T':
 			if (tcpopts[i] == '0') {
 				*ts0 = 1;
 				i++;
 			}
 			*packed = (*packed << PF_OSFP_TCPOPT_BITS) |
 			    PF_OSFP_TCPOPT_TS;
 			break;
 		}
 		(*optcnt) ++;
 		if (tcpopts[i] == '\0')
 			break;
 		if (tcpopts[i] != ',') {
 			fprintf(stderr, "%s:%d unknown option to %c TCP opt\n",
 			    filename, lineno, opt);
 			return (1);
 		}
 		i++;
 	}
 
 	return (0);
 }
 
 /* rip the next field ouf of a formatted config file line */
 char *
 get_field(char **line, size_t *len, int *fieldlen)
 {
 	char *ret, *ptr = *line;
 	size_t plen = *len;
 
 
 	while (plen && isspace(*ptr)) {
 		plen--;
 		ptr++;
 	}
 	ret = ptr;
 	*fieldlen = 0;
 
 	for (; plen > 0 && *ptr != ':'; plen--, ptr++)
 		(*fieldlen)++;
 	if (plen) {
 		*line = ptr + 1;
 		*len = plen - 1;
 	} else {
 		*len = 0;
 	}
 	while (*fieldlen && isspace(ret[*fieldlen - 1]))
 		(*fieldlen)--;
 	return (ret);
 }
 
 
 const char *
 print_ioctl(struct pf_osfp_ioctl *fp)
 {
 	static char buf[1024];
 	char tmp[32];
 	int i, opt;
 
 	*buf = '\0';
 	if (fp->fp_flags & PF_OSFP_WSIZE_DC)
 		strlcat(buf, "*", sizeof(buf));
 	else if (fp->fp_flags & PF_OSFP_WSIZE_MSS)
 		strlcat(buf, "S", sizeof(buf));
 	else if (fp->fp_flags & PF_OSFP_WSIZE_MTU)
 		strlcat(buf, "T", sizeof(buf));
 	else {
 		if (fp->fp_flags & PF_OSFP_WSIZE_MOD)
 			strlcat(buf, "%", sizeof(buf));
 		snprintf(tmp, sizeof(tmp), "%d", fp->fp_wsize);
 		strlcat(buf, tmp, sizeof(buf));
 	}
 	strlcat(buf, ":", sizeof(buf));
 
 	snprintf(tmp, sizeof(tmp), "%d", fp->fp_ttl);
 	strlcat(buf, tmp, sizeof(buf));
 	strlcat(buf, ":", sizeof(buf));
 
 	if (fp->fp_flags & PF_OSFP_DF)
 		strlcat(buf, "1", sizeof(buf));
 	else
 		strlcat(buf, "0", sizeof(buf));
 	strlcat(buf, ":", sizeof(buf));
 
 	if (fp->fp_flags & PF_OSFP_PSIZE_DC)
 		strlcat(buf, "*", sizeof(buf));
 	else {
 		if (fp->fp_flags & PF_OSFP_PSIZE_MOD)
 			strlcat(buf, "%", sizeof(buf));
 		snprintf(tmp, sizeof(tmp), "%d", fp->fp_psize);
 		strlcat(buf, tmp, sizeof(buf));
 	}
 	strlcat(buf, ":", sizeof(buf));
 
 	if (fp->fp_optcnt == 0)
 		strlcat(buf, ".", sizeof(buf));
 	for (i = fp->fp_optcnt - 1; i >= 0; i--) {
 		opt = fp->fp_tcpopts >> (i * PF_OSFP_TCPOPT_BITS);
 		opt &= (1 << PF_OSFP_TCPOPT_BITS) - 1;
 		switch (opt) {
 		case PF_OSFP_TCPOPT_NOP:
 			strlcat(buf, "N", sizeof(buf));
 			break;
 		case PF_OSFP_TCPOPT_SACK:
 			strlcat(buf, "S", sizeof(buf));
 			break;
 		case PF_OSFP_TCPOPT_TS:
 			strlcat(buf, "T", sizeof(buf));
 			if (fp->fp_flags & PF_OSFP_TS0)
 				strlcat(buf, "0", sizeof(buf));
 			break;
 		case PF_OSFP_TCPOPT_MSS:
 			strlcat(buf, "M", sizeof(buf));
 			if (fp->fp_flags & PF_OSFP_MSS_DC)
 				strlcat(buf, "*", sizeof(buf));
 			else {
 				if (fp->fp_flags & PF_OSFP_MSS_MOD)
 					strlcat(buf, "%", sizeof(buf));
 				snprintf(tmp, sizeof(tmp), "%d", fp->fp_mss);
 				strlcat(buf, tmp, sizeof(buf));
 			}
 			break;
 		case PF_OSFP_TCPOPT_WSCALE:
 			strlcat(buf, "W", sizeof(buf));
 			if (fp->fp_flags & PF_OSFP_WSCALE_DC)
 				strlcat(buf, "*", sizeof(buf));
 			else {
 				if (fp->fp_flags & PF_OSFP_WSCALE_MOD)
 					strlcat(buf, "%", sizeof(buf));
 				snprintf(tmp, sizeof(tmp), "%d", fp->fp_wscale);
 				strlcat(buf, tmp, sizeof(buf));
 			}
 			break;
 		}
 
 		if (i != 0)
 			strlcat(buf, ",", sizeof(buf));
 	}
 	strlcat(buf, ":", sizeof(buf));
 
 	strlcat(buf, fp->fp_os.fp_class_nm, sizeof(buf));
 	strlcat(buf, ":", sizeof(buf));
 	strlcat(buf, fp->fp_os.fp_version_nm, sizeof(buf));
 	strlcat(buf, ":", sizeof(buf));
 	strlcat(buf, fp->fp_os.fp_subtype_nm, sizeof(buf));
 	strlcat(buf, ":", sizeof(buf));
 
 	snprintf(tmp, sizeof(tmp), "TcpOpts %d 0x%llx", fp->fp_optcnt,
 	    (long long int)fp->fp_tcpopts);
 	strlcat(buf, tmp, sizeof(buf));
 
 	return (buf);
 }
Index: user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_parser.c
===================================================================
--- user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_parser.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sbin/pfctl/pfctl_parser.c	(revision 303775)
@@ -1,1768 +1,1768 @@
 /*	$OpenBSD: pfctl_parser.c,v 1.240 2008/06/10 20:55:02 mcbride Exp $ */
 
 /*
  * Copyright (c) 2001 Daniel Hartmeier
  * Copyright (c) 2002,2003 Henning Brauer
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  *
  *    - Redistributions of source code must retain the above copyright
  *      notice, this list of conditions and the following disclaimer.
  *    - Redistributions in binary form must reproduce the above
  *      copyright notice, this list of conditions and the following
  *      disclaimer in the documentation and/or other materials provided
  *      with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
  * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
  * COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
  * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
  * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
  * ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  *
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include <sys/types.h>
 #include <sys/ioctl.h>
 #include <sys/socket.h>
 #include <sys/param.h>
 #include <sys/proc.h>
 #include <net/if.h>
 #include <netinet/in.h>
 #include <netinet/in_systm.h>
 #include <netinet/ip.h>
 #include <netinet/ip_icmp.h>
 #include <netinet/icmp6.h>
 #include <net/pfvar.h>
 #include <arpa/inet.h>
 
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <ctype.h>
 #include <netdb.h>
 #include <stdarg.h>
 #include <errno.h>
 #include <err.h>
 #include <ifaddrs.h>
 #include <unistd.h>
 
 #include "pfctl_parser.h"
 #include "pfctl.h"
 
 void		 print_op (u_int8_t, const char *, const char *);
 void		 print_port (u_int8_t, u_int16_t, u_int16_t, const char *, int);
 void		 print_ugid (u_int8_t, unsigned, unsigned, const char *, unsigned);
 void		 print_flags (u_int8_t);
 void		 print_fromto(struct pf_rule_addr *, pf_osfp_t,
 		    struct pf_rule_addr *, u_int8_t, u_int8_t, int, int);
 int		 ifa_skip_if(const char *filter, struct node_host *p);
 
 struct node_host	*ifa_grouplookup(const char *, int);
 struct node_host	*host_if(const char *, int);
 struct node_host	*host_v4(const char *, int);
 struct node_host	*host_v6(const char *, int);
 struct node_host	*host_dns(const char *, int, int);
 
 const char * const tcpflags = "FSRPAUEW";
 
 static const struct icmptypeent icmp_type[] = {
 	{ "echoreq",	ICMP_ECHO },
 	{ "echorep",	ICMP_ECHOREPLY },
 	{ "unreach",	ICMP_UNREACH },
 	{ "squench",	ICMP_SOURCEQUENCH },
 	{ "redir",	ICMP_REDIRECT },
 	{ "althost",	ICMP_ALTHOSTADDR },
 	{ "routeradv",	ICMP_ROUTERADVERT },
 	{ "routersol",	ICMP_ROUTERSOLICIT },
 	{ "timex",	ICMP_TIMXCEED },
 	{ "paramprob",	ICMP_PARAMPROB },
 	{ "timereq",	ICMP_TSTAMP },
 	{ "timerep",	ICMP_TSTAMPREPLY },
 	{ "inforeq",	ICMP_IREQ },
 	{ "inforep",	ICMP_IREQREPLY },
 	{ "maskreq",	ICMP_MASKREQ },
 	{ "maskrep",	ICMP_MASKREPLY },
 	{ "trace",	ICMP_TRACEROUTE },
 	{ "dataconv",	ICMP_DATACONVERR },
 	{ "mobredir",	ICMP_MOBILE_REDIRECT },
 	{ "ipv6-where",	ICMP_IPV6_WHEREAREYOU },
 	{ "ipv6-here",	ICMP_IPV6_IAMHERE },
 	{ "mobregreq",	ICMP_MOBILE_REGREQUEST },
 	{ "mobregrep",	ICMP_MOBILE_REGREPLY },
 	{ "skip",	ICMP_SKIP },
 	{ "photuris",	ICMP_PHOTURIS }
 };
 
 static const struct icmptypeent icmp6_type[] = {
 	{ "unreach",	ICMP6_DST_UNREACH },
 	{ "toobig",	ICMP6_PACKET_TOO_BIG },
 	{ "timex",	ICMP6_TIME_EXCEEDED },
 	{ "paramprob",	ICMP6_PARAM_PROB },
 	{ "echoreq",	ICMP6_ECHO_REQUEST },
 	{ "echorep",	ICMP6_ECHO_REPLY },
 	{ "groupqry",	ICMP6_MEMBERSHIP_QUERY },
 	{ "listqry",	MLD_LISTENER_QUERY },
 	{ "grouprep",	ICMP6_MEMBERSHIP_REPORT },
 	{ "listenrep",	MLD_LISTENER_REPORT },
 	{ "groupterm",	ICMP6_MEMBERSHIP_REDUCTION },
 	{ "listendone", MLD_LISTENER_DONE },
 	{ "routersol",	ND_ROUTER_SOLICIT },
 	{ "routeradv",	ND_ROUTER_ADVERT },
 	{ "neighbrsol", ND_NEIGHBOR_SOLICIT },
 	{ "neighbradv", ND_NEIGHBOR_ADVERT },
 	{ "redir",	ND_REDIRECT },
 	{ "routrrenum", ICMP6_ROUTER_RENUMBERING },
 	{ "wrureq",	ICMP6_WRUREQUEST },
 	{ "wrurep",	ICMP6_WRUREPLY },
 	{ "fqdnreq",	ICMP6_FQDN_QUERY },
 	{ "fqdnrep",	ICMP6_FQDN_REPLY },
 	{ "niqry",	ICMP6_NI_QUERY },
 	{ "nirep",	ICMP6_NI_REPLY },
 	{ "mtraceresp",	MLD_MTRACE_RESP },
 	{ "mtrace",	MLD_MTRACE }
 };
 
 static const struct icmpcodeent icmp_code[] = {
 	{ "net-unr",		ICMP_UNREACH,	ICMP_UNREACH_NET },
 	{ "host-unr",		ICMP_UNREACH,	ICMP_UNREACH_HOST },
 	{ "proto-unr",		ICMP_UNREACH,	ICMP_UNREACH_PROTOCOL },
 	{ "port-unr",		ICMP_UNREACH,	ICMP_UNREACH_PORT },
 	{ "needfrag",		ICMP_UNREACH,	ICMP_UNREACH_NEEDFRAG },
 	{ "srcfail",		ICMP_UNREACH,	ICMP_UNREACH_SRCFAIL },
 	{ "net-unk",		ICMP_UNREACH,	ICMP_UNREACH_NET_UNKNOWN },
 	{ "host-unk",		ICMP_UNREACH,	ICMP_UNREACH_HOST_UNKNOWN },
 	{ "isolate",		ICMP_UNREACH,	ICMP_UNREACH_ISOLATED },
 	{ "net-prohib",		ICMP_UNREACH,	ICMP_UNREACH_NET_PROHIB },
 	{ "host-prohib",	ICMP_UNREACH,	ICMP_UNREACH_HOST_PROHIB },
 	{ "net-tos",		ICMP_UNREACH,	ICMP_UNREACH_TOSNET },
 	{ "host-tos",		ICMP_UNREACH,	ICMP_UNREACH_TOSHOST },
 	{ "filter-prohib",	ICMP_UNREACH,	ICMP_UNREACH_FILTER_PROHIB },
 	{ "host-preced",	ICMP_UNREACH,	ICMP_UNREACH_HOST_PRECEDENCE },
 	{ "cutoff-preced",	ICMP_UNREACH,	ICMP_UNREACH_PRECEDENCE_CUTOFF },
 	{ "redir-net",		ICMP_REDIRECT,	ICMP_REDIRECT_NET },
 	{ "redir-host",		ICMP_REDIRECT,	ICMP_REDIRECT_HOST },
 	{ "redir-tos-net",	ICMP_REDIRECT,	ICMP_REDIRECT_TOSNET },
 	{ "redir-tos-host",	ICMP_REDIRECT,	ICMP_REDIRECT_TOSHOST },
 	{ "normal-adv",		ICMP_ROUTERADVERT, ICMP_ROUTERADVERT_NORMAL },
 	{ "common-adv",		ICMP_ROUTERADVERT, ICMP_ROUTERADVERT_NOROUTE_COMMON },
 	{ "transit",		ICMP_TIMXCEED,	ICMP_TIMXCEED_INTRANS },
 	{ "reassemb",		ICMP_TIMXCEED,	ICMP_TIMXCEED_REASS },
 	{ "badhead",		ICMP_PARAMPROB,	ICMP_PARAMPROB_ERRATPTR },
 	{ "optmiss",		ICMP_PARAMPROB,	ICMP_PARAMPROB_OPTABSENT },
 	{ "badlen",		ICMP_PARAMPROB,	ICMP_PARAMPROB_LENGTH },
 	{ "unknown-ind",	ICMP_PHOTURIS,	ICMP_PHOTURIS_UNKNOWN_INDEX },
 	{ "auth-fail",		ICMP_PHOTURIS,	ICMP_PHOTURIS_AUTH_FAILED },
 	{ "decrypt-fail",	ICMP_PHOTURIS,	ICMP_PHOTURIS_DECRYPT_FAILED }
 };
 
 static const struct icmpcodeent icmp6_code[] = {
 	{ "admin-unr", ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_ADMIN },
 	{ "noroute-unr", ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_NOROUTE },
 	{ "notnbr-unr",	ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_NOTNEIGHBOR },
 	{ "beyond-unr", ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_BEYONDSCOPE },
 	{ "addr-unr", ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_ADDR },
 	{ "port-unr", ICMP6_DST_UNREACH, ICMP6_DST_UNREACH_NOPORT },
 	{ "transit", ICMP6_TIME_EXCEEDED, ICMP6_TIME_EXCEED_TRANSIT },
 	{ "reassemb", ICMP6_TIME_EXCEEDED, ICMP6_TIME_EXCEED_REASSEMBLY },
 	{ "badhead", ICMP6_PARAM_PROB, ICMP6_PARAMPROB_HEADER },
 	{ "nxthdr", ICMP6_PARAM_PROB, ICMP6_PARAMPROB_NEXTHEADER },
 	{ "redironlink", ND_REDIRECT, ND_REDIRECT_ONLINK },
 	{ "redirrouter", ND_REDIRECT, ND_REDIRECT_ROUTER }
 };
 
 const struct pf_timeout pf_timeouts[] = {
 	{ "tcp.first",		PFTM_TCP_FIRST_PACKET },
 	{ "tcp.opening",	PFTM_TCP_OPENING },
 	{ "tcp.established",	PFTM_TCP_ESTABLISHED },
 	{ "tcp.closing",	PFTM_TCP_CLOSING },
 	{ "tcp.finwait",	PFTM_TCP_FIN_WAIT },
 	{ "tcp.closed",		PFTM_TCP_CLOSED },
 	{ "tcp.tsdiff",		PFTM_TS_DIFF },
 	{ "udp.first",		PFTM_UDP_FIRST_PACKET },
 	{ "udp.single",		PFTM_UDP_SINGLE },
 	{ "udp.multiple",	PFTM_UDP_MULTIPLE },
 	{ "icmp.first",		PFTM_ICMP_FIRST_PACKET },
 	{ "icmp.error",		PFTM_ICMP_ERROR_REPLY },
 	{ "other.first",	PFTM_OTHER_FIRST_PACKET },
 	{ "other.single",	PFTM_OTHER_SINGLE },
 	{ "other.multiple",	PFTM_OTHER_MULTIPLE },
 	{ "frag",		PFTM_FRAG },
 	{ "interval",		PFTM_INTERVAL },
 	{ "adaptive.start",	PFTM_ADAPTIVE_START },
 	{ "adaptive.end",	PFTM_ADAPTIVE_END },
 	{ "src.track",		PFTM_SRC_NODE },
 	{ NULL,			0 }
 };
 
 const struct icmptypeent *
 geticmptypebynumber(u_int8_t type, sa_family_t af)
 {
 	unsigned int	i;
 
 	if (af != AF_INET6) {
 		for (i=0; i < nitems(icmp_type); i++) {
 			if (type == icmp_type[i].type)
 				return (&icmp_type[i]);
 		}
 	} else {
 		for (i=0; i < nitems(icmp6_type); i++) {
 			if (type == icmp6_type[i].type)
 				 return (&icmp6_type[i]);
 		}
 	}
 	return (NULL);
 }
 
 const struct icmptypeent *
 geticmptypebyname(char *w, sa_family_t af)
 {
 	unsigned int	i;
 
 	if (af != AF_INET6) {
 		for (i=0; i < nitems(icmp_type); i++) {
 			if (!strcmp(w, icmp_type[i].name))
 				return (&icmp_type[i]);
 		}
 	} else {
 		for (i=0; i < nitems(icmp6_type); i++) {
 			if (!strcmp(w, icmp6_type[i].name))
 				return (&icmp6_type[i]);
 		}
 	}
 	return (NULL);
 }
 
 const struct icmpcodeent *
 geticmpcodebynumber(u_int8_t type, u_int8_t code, sa_family_t af)
 {
 	unsigned int	i;
 
 	if (af != AF_INET6) {
 		for (i=0; i < nitems(icmp_code); i++) {
 			if (type == icmp_code[i].type &&
 			    code == icmp_code[i].code)
 				return (&icmp_code[i]);
 		}
 	} else {
 		for (i=0; i < nitems(icmp6_code); i++) {
 			if (type == icmp6_code[i].type &&
 			    code == icmp6_code[i].code)
 				return (&icmp6_code[i]);
 		}
 	}
 	return (NULL);
 }
 
 const struct icmpcodeent *
 geticmpcodebyname(u_long type, char *w, sa_family_t af)
 {
 	unsigned int	i;
 
 	if (af != AF_INET6) {
 		for (i=0; i < nitems(icmp_code); i++) {
 			if (type == icmp_code[i].type &&
 			    !strcmp(w, icmp_code[i].name))
 				return (&icmp_code[i]);
 		}
 	} else {
 		for (i=0; i < nitems(icmp6_code); i++) {
 			if (type == icmp6_code[i].type &&
 			    !strcmp(w, icmp6_code[i].name))
 				return (&icmp6_code[i]);
 		}
 	}
 	return (NULL);
 }
 
 void
 print_op(u_int8_t op, const char *a1, const char *a2)
 {
 	if (op == PF_OP_IRG)
 		printf(" %s >< %s", a1, a2);
 	else if (op == PF_OP_XRG)
 		printf(" %s <> %s", a1, a2);
 	else if (op == PF_OP_EQ)
 		printf(" = %s", a1);
 	else if (op == PF_OP_NE)
 		printf(" != %s", a1);
 	else if (op == PF_OP_LT)
 		printf(" < %s", a1);
 	else if (op == PF_OP_LE)
 		printf(" <= %s", a1);
 	else if (op == PF_OP_GT)
 		printf(" > %s", a1);
 	else if (op == PF_OP_GE)
 		printf(" >= %s", a1);
 	else if (op == PF_OP_RRG)
 		printf(" %s:%s", a1, a2);
 }
 
 void
 print_port(u_int8_t op, u_int16_t p1, u_int16_t p2, const char *proto, int numeric)
 {
 	char		 a1[6], a2[6];
 	struct servent	*s;
 
 	if (!numeric)
 		s = getservbyport(p1, proto);
 	else
 		s = NULL;
 	p1 = ntohs(p1);
 	p2 = ntohs(p2);
 	snprintf(a1, sizeof(a1), "%u", p1);
 	snprintf(a2, sizeof(a2), "%u", p2);
 	printf(" port");
 	if (s != NULL && (op == PF_OP_EQ || op == PF_OP_NE))
 		print_op(op, s->s_name, a2);
 	else
 		print_op(op, a1, a2);
 }
 
 void
 print_ugid(u_int8_t op, unsigned u1, unsigned u2, const char *t, unsigned umax)
 {
 	char	a1[11], a2[11];
 
 	snprintf(a1, sizeof(a1), "%u", u1);
 	snprintf(a2, sizeof(a2), "%u", u2);
 	printf(" %s", t);
 	if (u1 == umax && (op == PF_OP_EQ || op == PF_OP_NE))
 		print_op(op, "unknown", a2);
 	else
 		print_op(op, a1, a2);
 }
 
 void
 print_flags(u_int8_t f)
 {
 	int	i;
 
 	for (i = 0; tcpflags[i]; ++i)
 		if (f & (1 << i))
 			printf("%c", tcpflags[i]);
 }
 
 void
 print_fromto(struct pf_rule_addr *src, pf_osfp_t osfp, struct pf_rule_addr *dst,
     sa_family_t af, u_int8_t proto, int verbose, int numeric)
 {
 	char buf[PF_OSFP_LEN*3];
 	if (src->addr.type == PF_ADDR_ADDRMASK &&
 	    dst->addr.type == PF_ADDR_ADDRMASK &&
 	    PF_AZERO(&src->addr.v.a.addr, AF_INET6) &&
 	    PF_AZERO(&src->addr.v.a.mask, AF_INET6) &&
 	    PF_AZERO(&dst->addr.v.a.addr, AF_INET6) &&
 	    PF_AZERO(&dst->addr.v.a.mask, AF_INET6) &&
 	    !src->neg && !dst->neg &&
 	    !src->port_op && !dst->port_op &&
 	    osfp == PF_OSFP_ANY)
 		printf(" all");
 	else {
 		printf(" from ");
 		if (src->neg)
 			printf("! ");
 		print_addr(&src->addr, af, verbose);
 		if (src->port_op)
 			print_port(src->port_op, src->port[0],
 			    src->port[1],
 			    proto == IPPROTO_TCP ? "tcp" : "udp",
 			    numeric);
 		if (osfp != PF_OSFP_ANY)
 			printf(" os \"%s\"", pfctl_lookup_fingerprint(osfp, buf,
 			    sizeof(buf)));
 
 		printf(" to ");
 		if (dst->neg)
 			printf("! ");
 		print_addr(&dst->addr, af, verbose);
 		if (dst->port_op)
 			print_port(dst->port_op, dst->port[0],
 			    dst->port[1],
 			    proto == IPPROTO_TCP ? "tcp" : "udp",
 			    numeric);
 	}
 }
 
 void
 print_pool(struct pf_pool *pool, u_int16_t p1, u_int16_t p2,
     sa_family_t af, int id)
 {
 	struct pf_pooladdr	*pooladdr;
 
 	if ((TAILQ_FIRST(&pool->list) != NULL) &&
 	    TAILQ_NEXT(TAILQ_FIRST(&pool->list), entries) != NULL)
 		printf("{ ");
 	TAILQ_FOREACH(pooladdr, &pool->list, entries){
 		switch (id) {
 		case PF_NAT:
 		case PF_RDR:
 		case PF_BINAT:
 			print_addr(&pooladdr->addr, af, 0);
 			break;
 		case PF_PASS:
 			if (PF_AZERO(&pooladdr->addr.v.a.addr, af))
 				printf("%s", pooladdr->ifname);
 			else {
 				printf("(%s ", pooladdr->ifname);
 				print_addr(&pooladdr->addr, af, 0);
 				printf(")");
 			}
 			break;
 		default:
 			break;
 		}
 		if (TAILQ_NEXT(pooladdr, entries) != NULL)
 			printf(", ");
 		else if (TAILQ_NEXT(TAILQ_FIRST(&pool->list), entries) != NULL)
 			printf(" }");
 	}
 	switch (id) {
 	case PF_NAT:
 		if ((p1 != PF_NAT_PROXY_PORT_LOW ||
 		    p2 != PF_NAT_PROXY_PORT_HIGH) && (p1 != 0 || p2 != 0)) {
 			if (p1 == p2)
 				printf(" port %u", p1);
 			else
 				printf(" port %u:%u", p1, p2);
 		}
 		break;
 	case PF_RDR:
 		if (p1) {
 			printf(" port %u", p1);
 			if (p2 && (p2 != p1))
 				printf(":%u", p2);
 		}
 		break;
 	default:
 		break;
 	}
 	switch (pool->opts & PF_POOL_TYPEMASK) {
 	case PF_POOL_NONE:
 		break;
 	case PF_POOL_BITMASK:
 		printf(" bitmask");
 		break;
 	case PF_POOL_RANDOM:
 		printf(" random");
 		break;
 	case PF_POOL_SRCHASH:
 		printf(" source-hash 0x%08x%08x%08x%08x",
 		    pool->key.key32[0], pool->key.key32[1],
 		    pool->key.key32[2], pool->key.key32[3]);
 		break;
 	case PF_POOL_ROUNDROBIN:
 		printf(" round-robin");
 		break;
 	}
 	if (pool->opts & PF_POOL_STICKYADDR)
 		printf(" sticky-address");
 	if (id == PF_NAT && p1 == 0 && p2 == 0)
 		printf(" static-port");
 }
 
 const char	* const pf_reasons[PFRES_MAX+1] = PFRES_NAMES;
 const char	* const pf_lcounters[LCNT_MAX+1] = LCNT_NAMES;
 const char	* const pf_fcounters[FCNT_MAX+1] = FCNT_NAMES;
 const char	* const pf_scounters[FCNT_MAX+1] = FCNT_NAMES;
 
 void
 print_status(struct pf_status *s, int opts)
 {
 	char			statline[80], *running;
 	time_t			runtime;
 	int			i;
 	char			buf[PF_MD5_DIGEST_LENGTH * 2 + 1];
 	static const char	hex[] = "0123456789abcdef";
 
 	runtime = time(NULL) - s->since;
 	running = s->running ? "Enabled" : "Disabled";
 
 	if (s->since) {
 		unsigned int	sec, min, hrs, day = runtime;
 
 		sec = day % 60;
 		day /= 60;
 		min = day % 60;
 		day /= 60;
 		hrs = day % 24;
 		day /= 24;
 		snprintf(statline, sizeof(statline),
 		    "Status: %s for %u days %.2u:%.2u:%.2u",
 		    running, day, hrs, min, sec);
 	} else
 		snprintf(statline, sizeof(statline), "Status: %s", running);
 	printf("%-44s", statline);
 	switch (s->debug) {
 	case PF_DEBUG_NONE:
 		printf("%15s\n\n", "Debug: None");
 		break;
 	case PF_DEBUG_URGENT:
 		printf("%15s\n\n", "Debug: Urgent");
 		break;
 	case PF_DEBUG_MISC:
 		printf("%15s\n\n", "Debug: Misc");
 		break;
 	case PF_DEBUG_NOISY:
 		printf("%15s\n\n", "Debug: Loud");
 		break;
 	}
 
 	if (opts & PF_OPT_VERBOSE) {
 		printf("Hostid:   0x%08x\n", ntohl(s->hostid));
 
 		for (i = 0; i < PF_MD5_DIGEST_LENGTH; i++) {
 			buf[i + i] = hex[s->pf_chksum[i] >> 4];
 			buf[i + i + 1] = hex[s->pf_chksum[i] & 0x0f];
 		}
 		buf[i + i] = '\0';
 		printf("Checksum: 0x%s\n\n", buf);
 	}
 
 	if (s->ifname[0] != 0) {
 		printf("Interface Stats for %-16s %5s %16s\n",
 		    s->ifname, "IPv4", "IPv6");
 		printf("  %-25s %14llu %16llu\n", "Bytes In",
 		    (unsigned long long)s->bcounters[0][0],
 		    (unsigned long long)s->bcounters[1][0]);
 		printf("  %-25s %14llu %16llu\n", "Bytes Out",
 		    (unsigned long long)s->bcounters[0][1],
 		    (unsigned long long)s->bcounters[1][1]);
 		printf("  Packets In\n");
 		printf("    %-23s %14llu %16llu\n", "Passed",
 		    (unsigned long long)s->pcounters[0][0][PF_PASS],
 		    (unsigned long long)s->pcounters[1][0][PF_PASS]);
 		printf("    %-23s %14llu %16llu\n", "Blocked",
 		    (unsigned long long)s->pcounters[0][0][PF_DROP],
 		    (unsigned long long)s->pcounters[1][0][PF_DROP]);
 		printf("  Packets Out\n");
 		printf("    %-23s %14llu %16llu\n", "Passed",
 		    (unsigned long long)s->pcounters[0][1][PF_PASS],
 		    (unsigned long long)s->pcounters[1][1][PF_PASS]);
 		printf("    %-23s %14llu %16llu\n\n", "Blocked",
 		    (unsigned long long)s->pcounters[0][1][PF_DROP],
 		    (unsigned long long)s->pcounters[1][1][PF_DROP]);
 	}
 	printf("%-27s %14s %16s\n", "State Table", "Total", "Rate");
 	printf("  %-25s %14u %14s\n", "current entries", s->states, "");
 	for (i = 0; i < FCNT_MAX; i++) {
 		printf("  %-25s %14llu ", pf_fcounters[i],
 			    (unsigned long long)s->fcounters[i]);
 		if (runtime > 0)
 			printf("%14.1f/s\n",
 			    (double)s->fcounters[i] / (double)runtime);
 		else
 			printf("%14s\n", "");
 	}
 	if (opts & PF_OPT_VERBOSE) {
 		printf("Source Tracking Table\n");
 		printf("  %-25s %14u %14s\n", "current entries",
 		    s->src_nodes, "");
 		for (i = 0; i < SCNT_MAX; i++) {
 			printf("  %-25s %14lld ", pf_scounters[i],
 #ifdef __FreeBSD__
 				    (long long)s->scounters[i]);
 #else
 				    s->scounters[i]);
 #endif
 			if (runtime > 0)
 				printf("%14.1f/s\n",
 				    (double)s->scounters[i] / (double)runtime);
 			else
 				printf("%14s\n", "");
 		}
 	}
 	printf("Counters\n");
 	for (i = 0; i < PFRES_MAX; i++) {
 		printf("  %-25s %14llu ", pf_reasons[i],
 		    (unsigned long long)s->counters[i]);
 		if (runtime > 0)
 			printf("%14.1f/s\n",
 			    (double)s->counters[i] / (double)runtime);
 		else
 			printf("%14s\n", "");
 	}
 	if (opts & PF_OPT_VERBOSE) {
 		printf("Limit Counters\n");
 		for (i = 0; i < LCNT_MAX; i++) {
 			printf("  %-25s %14lld ", pf_lcounters[i],
 #ifdef __FreeBSD__
 				    (unsigned long long)s->lcounters[i]);
 #else
 				    s->lcounters[i]);
 #endif
 			if (runtime > 0)
 				printf("%14.1f/s\n",
 				    (double)s->lcounters[i] / (double)runtime);
 			else
 				printf("%14s\n", "");
 		}
 	}
 }
 
 void
 print_src_node(struct pf_src_node *sn, int opts)
 {
 	struct pf_addr_wrap aw;
 	int min, sec;
 
 	memset(&aw, 0, sizeof(aw));
 	if (sn->af == AF_INET)
 		aw.v.a.mask.addr32[0] = 0xffffffff;
 	else
 		memset(&aw.v.a.mask, 0xff, sizeof(aw.v.a.mask));
 
 	aw.v.a.addr = sn->addr;
 	print_addr(&aw, sn->af, opts & PF_OPT_VERBOSE2);
 	printf(" -> ");
 	aw.v.a.addr = sn->raddr;
 	print_addr(&aw, sn->af, opts & PF_OPT_VERBOSE2);
 	printf(" ( states %u, connections %u, rate %u.%u/%us )\n", sn->states,
 	    sn->conn, sn->conn_rate.count / 1000,
 	    (sn->conn_rate.count % 1000) / 100, sn->conn_rate.seconds);
 	if (opts & PF_OPT_VERBOSE) {
 		sec = sn->creation % 60;
 		sn->creation /= 60;
 		min = sn->creation % 60;
 		sn->creation /= 60;
 		printf("   age %.2u:%.2u:%.2u", sn->creation, min, sec);
 		if (sn->states == 0) {
 			sec = sn->expire % 60;
 			sn->expire /= 60;
 			min = sn->expire % 60;
 			sn->expire /= 60;
 			printf(", expires in %.2u:%.2u:%.2u",
 			    sn->expire, min, sec);
 		}
 		printf(", %llu pkts, %llu bytes",
 #ifdef __FreeBSD__
 		    (unsigned long long)(sn->packets[0] + sn->packets[1]),
 		    (unsigned long long)(sn->bytes[0] + sn->bytes[1]));
 #else
 		    sn->packets[0] + sn->packets[1],
 		    sn->bytes[0] + sn->bytes[1]);
 #endif
 		switch (sn->ruletype) {
 		case PF_NAT:
 			if (sn->rule.nr != -1)
 				printf(", nat rule %u", sn->rule.nr);
 			break;
 		case PF_RDR:
 			if (sn->rule.nr != -1)
 				printf(", rdr rule %u", sn->rule.nr);
 			break;
 		case PF_PASS:
 			if (sn->rule.nr != -1)
 				printf(", filter rule %u", sn->rule.nr);
 			break;
 		}
 		printf("\n");
 	}
 }
 
 void
 print_rule(struct pf_rule *r, const char *anchor_call, int verbose, int numeric)
 {
 	static const char *actiontypes[] = { "pass", "block", "scrub",
 	    "no scrub", "nat", "no nat", "binat", "no binat", "rdr", "no rdr" };
 	static const char *anchortypes[] = { "anchor", "anchor", "anchor",
 	    "anchor", "nat-anchor", "nat-anchor", "binat-anchor",
 	    "binat-anchor", "rdr-anchor", "rdr-anchor" };
 	int	i, opts;
 
 	if (verbose)
 		printf("@%d ", r->nr);
 	if (r->action > PF_NORDR)
 		printf("action(%d)", r->action);
 	else if (anchor_call[0]) {
 		if (anchor_call[0] == '_') {
 			printf("%s", anchortypes[r->action]);
 		} else
 			printf("%s \"%s\"", anchortypes[r->action],
 			    anchor_call);
 	} else {
 		printf("%s", actiontypes[r->action]);
 		if (r->natpass)
 			printf(" pass");
 	}
 	if (r->action == PF_DROP) {
 		if (r->rule_flag & PFRULE_RETURN)
 			printf(" return");
 		else if (r->rule_flag & PFRULE_RETURNRST) {
 			if (!r->return_ttl)
 				printf(" return-rst");
 			else
 				printf(" return-rst(ttl %d)", r->return_ttl);
 		} else if (r->rule_flag & PFRULE_RETURNICMP) {
 			const struct icmpcodeent	*ic, *ic6;
 
 			ic = geticmpcodebynumber(r->return_icmp >> 8,
 			    r->return_icmp & 255, AF_INET);
 			ic6 = geticmpcodebynumber(r->return_icmp6 >> 8,
 			    r->return_icmp6 & 255, AF_INET6);
 
 			switch (r->af) {
 			case AF_INET:
 				printf(" return-icmp");
 				if (ic == NULL)
 					printf("(%u)", r->return_icmp & 255);
 				else
 					printf("(%s)", ic->name);
 				break;
 			case AF_INET6:
 				printf(" return-icmp6");
 				if (ic6 == NULL)
 					printf("(%u)", r->return_icmp6 & 255);
 				else
 					printf("(%s)", ic6->name);
 				break;
 			default:
 				printf(" return-icmp");
 				if (ic == NULL)
 					printf("(%u, ", r->return_icmp & 255);
 				else
 					printf("(%s, ", ic->name);
 				if (ic6 == NULL)
 					printf("%u)", r->return_icmp6 & 255);
 				else
 					printf("%s)", ic6->name);
 				break;
 			}
 		} else
 			printf(" drop");
 	}
 	if (r->direction == PF_IN)
 		printf(" in");
 	else if (r->direction == PF_OUT)
 		printf(" out");
 	if (r->log) {
 		printf(" log");
 		if (r->log & ~PF_LOG || r->logif) {
 			int count = 0;
 
 			printf(" (");
 			if (r->log & PF_LOG_ALL)
 				printf("%sall", count++ ? ", " : "");
 			if (r->log & PF_LOG_SOCKET_LOOKUP)
 				printf("%suser", count++ ? ", " : "");
 			if (r->logif)
 				printf("%sto pflog%u", count++ ? ", " : "",
 				    r->logif);
 			printf(")");
 		}
 	}
 	if (r->quick)
 		printf(" quick");
 	if (r->ifname[0]) {
 		if (r->ifnot)
 			printf(" on ! %s", r->ifname);
 		else
 			printf(" on %s", r->ifname);
 	}
 	if (r->rt) {
 		if (r->rt == PF_ROUTETO)
 			printf(" route-to");
 		else if (r->rt == PF_REPLYTO)
 			printf(" reply-to");
 		else if (r->rt == PF_DUPTO)
 			printf(" dup-to");
 		else if (r->rt == PF_FASTROUTE)
 			printf(" fastroute");
 		if (r->rt != PF_FASTROUTE) {
 			printf(" ");
 			print_pool(&r->rpool, 0, 0, r->af, PF_PASS);
 		}
 	}
 	if (r->af) {
 		if (r->af == AF_INET)
 			printf(" inet");
 		else
 			printf(" inet6");
 	}
 	if (r->proto) {
 		struct protoent	*p;
 
 		if ((p = getprotobynumber(r->proto)) != NULL)
 			printf(" proto %s", p->p_name);
 		else
 			printf(" proto %u", r->proto);
 	}
 	print_fromto(&r->src, r->os_fingerprint, &r->dst, r->af, r->proto,
 	    verbose, numeric);
 	if (r->uid.op)
 		print_ugid(r->uid.op, r->uid.uid[0], r->uid.uid[1], "user",
 		    UID_MAX);
 	if (r->gid.op)
 		print_ugid(r->gid.op, r->gid.gid[0], r->gid.gid[1], "group",
 		    GID_MAX);
 	if (r->flags || r->flagset) {
 		printf(" flags ");
 		print_flags(r->flags);
 		printf("/");
 		print_flags(r->flagset);
 	} else if (r->action == PF_PASS &&
 	    (!r->proto || r->proto == IPPROTO_TCP) &&
 	    !(r->rule_flag & PFRULE_FRAGMENT) &&
 	    !anchor_call[0] && r->keep_state)
 		printf(" flags any");
 	if (r->type) {
 		const struct icmptypeent	*it;
 
 		it = geticmptypebynumber(r->type-1, r->af);
 		if (r->af != AF_INET6)
 			printf(" icmp-type");
 		else
 			printf(" icmp6-type");
 		if (it != NULL)
 			printf(" %s", it->name);
 		else
 			printf(" %u", r->type-1);
 		if (r->code) {
 			const struct icmpcodeent	*ic;
 
 			ic = geticmpcodebynumber(r->type-1, r->code-1, r->af);
 			if (ic != NULL)
 				printf(" code %s", ic->name);
 			else
 				printf(" code %u", r->code-1);
 		}
 	}
 	if (r->tos)
 		printf(" tos 0x%2.2x", r->tos);
 	if (r->prio)
 		printf(" prio %u", r->prio == PF_PRIO_ZERO ? 0 : r->prio);
 	if (r->scrub_flags & PFSTATE_SETMASK) {
 		char *comma = "";
 		printf(" set (");
 		if (r->scrub_flags & PFSTATE_SETPRIO) {
 			if (r->set_prio[0] == r->set_prio[1])
 				printf("%s prio %u", comma, r->set_prio[0]);
 			else
 				printf("%s prio(%u, %u)", comma, r->set_prio[0],
 				    r->set_prio[1]);
 			comma = ",";
 		}
 		printf(" )");
 	}
 	if (!r->keep_state && r->action == PF_PASS && !anchor_call[0])
 		printf(" no state");
 	else if (r->keep_state == PF_STATE_NORMAL)
 		printf(" keep state");
 	else if (r->keep_state == PF_STATE_MODULATE)
 		printf(" modulate state");
 	else if (r->keep_state == PF_STATE_SYNPROXY)
 		printf(" synproxy state");
 	if (r->prob) {
 		char	buf[20];
 
 		snprintf(buf, sizeof(buf), "%f", r->prob*100.0/(UINT_MAX+1.0));
 		for (i = strlen(buf)-1; i > 0; i--) {
 			if (buf[i] == '0')
 				buf[i] = '\0';
 			else {
 				if (buf[i] == '.')
 					buf[i] = '\0';
 				break;
 			}
 		}
 		printf(" probability %s%%", buf);
 	}
 	opts = 0;
 	if (r->max_states || r->max_src_nodes || r->max_src_states)
 		opts = 1;
 	if (r->rule_flag & PFRULE_NOSYNC)
 		opts = 1;
 	if (r->rule_flag & PFRULE_SRCTRACK)
 		opts = 1;
 	if (r->rule_flag & PFRULE_IFBOUND)
 		opts = 1;
 	if (r->rule_flag & PFRULE_STATESLOPPY)
 		opts = 1;
 	for (i = 0; !opts && i < PFTM_MAX; ++i)
 		if (r->timeout[i])
 			opts = 1;
 	if (opts) {
 		printf(" (");
 		if (r->max_states) {
 			printf("max %u", r->max_states);
 			opts = 0;
 		}
 		if (r->rule_flag & PFRULE_NOSYNC) {
 			if (!opts)
 				printf(", ");
 			printf("no-sync");
 			opts = 0;
 		}
 		if (r->rule_flag & PFRULE_SRCTRACK) {
 			if (!opts)
 				printf(", ");
 			printf("source-track");
 			if (r->rule_flag & PFRULE_RULESRCTRACK)
 				printf(" rule");
 			else
 				printf(" global");
 			opts = 0;
 		}
 		if (r->max_src_states) {
 			if (!opts)
 				printf(", ");
 			printf("max-src-states %u", r->max_src_states);
 			opts = 0;
 		}
 		if (r->max_src_conn) {
 			if (!opts)
 				printf(", ");
 			printf("max-src-conn %u", r->max_src_conn);
 			opts = 0;
 		}
 		if (r->max_src_conn_rate.limit) {
 			if (!opts)
 				printf(", ");
 			printf("max-src-conn-rate %u/%u",
 			    r->max_src_conn_rate.limit,
 			    r->max_src_conn_rate.seconds);
 			opts = 0;
 		}
 		if (r->max_src_nodes) {
 			if (!opts)
 				printf(", ");
 			printf("max-src-nodes %u", r->max_src_nodes);
 			opts = 0;
 		}
 		if (r->overload_tblname[0]) {
 			if (!opts)
 				printf(", ");
 			printf("overload <%s>", r->overload_tblname);
 			if (r->flush)
 				printf(" flush");
 			if (r->flush & PF_FLUSH_GLOBAL)
 				printf(" global");
 		}
 		if (r->rule_flag & PFRULE_IFBOUND) {
 			if (!opts)
 				printf(", ");
 			printf("if-bound");
 			opts = 0;
 		}
 		if (r->rule_flag & PFRULE_STATESLOPPY) {
 			if (!opts)
 				printf(", ");
 			printf("sloppy");
 			opts = 0;
 		}
 		for (i = 0; i < PFTM_MAX; ++i)
 			if (r->timeout[i]) {
 				int j;
 
 				if (!opts)
 					printf(", ");
 				opts = 0;
 				for (j = 0; pf_timeouts[j].name != NULL;
 				    ++j)
 					if (pf_timeouts[j].timeout == i)
 						break;
 				printf("%s %u", pf_timeouts[j].name == NULL ?
 				    "inv.timeout" : pf_timeouts[j].name,
 				    r->timeout[i]);
 			}
 		printf(")");
 	}
 	if (r->rule_flag & PFRULE_FRAGMENT)
 		printf(" fragment");
 	if (r->rule_flag & PFRULE_NODF)
 		printf(" no-df");
 	if (r->rule_flag & PFRULE_RANDOMID)
 		printf(" random-id");
 	if (r->min_ttl)
 		printf(" min-ttl %d", r->min_ttl);
 	if (r->max_mss)
 		printf(" max-mss %d", r->max_mss);
 	if (r->rule_flag & PFRULE_SET_TOS)
 		printf(" set-tos 0x%2.2x", r->set_tos);
 	if (r->allow_opts)
 		printf(" allow-opts");
 	if (r->action == PF_SCRUB) {
 		if (r->rule_flag & PFRULE_REASSEMBLE_TCP)
 			printf(" reassemble tcp");
 
 		printf(" fragment reassemble");
 	}
 	if (r->label[0])
 		printf(" label \"%s\"", r->label);
 	if (r->qname[0] && r->pqname[0])
 		printf(" queue(%s, %s)", r->qname, r->pqname);
 	else if (r->qname[0])
 		printf(" queue %s", r->qname);
 	if (r->tagname[0])
 		printf(" tag %s", r->tagname);
 	if (r->match_tagname[0]) {
 		if (r->match_tag_not)
 			printf(" !");
 		printf(" tagged %s", r->match_tagname);
 	}
 	if (r->rtableid != -1)
 		printf(" rtable %u", r->rtableid);
 	if (r->divert.port) {
 #ifdef __FreeBSD__
 		printf(" divert-to %u", ntohs(r->divert.port));
 #else
 		if (PF_AZERO(&r->divert.addr, r->af)) {
 			printf(" divert-reply");
 		} else {
 			/* XXX cut&paste from print_addr */
 			char buf[48];
 
 			printf(" divert-to ");
 			if (inet_ntop(r->af, &r->divert.addr, buf,
 			    sizeof(buf)) == NULL)
 				printf("?");
 			else
 				printf("%s", buf);
 			printf(" port %u", ntohs(r->divert.port));
 		}
 #endif
 	}
 	if (!anchor_call[0] && (r->action == PF_NAT ||
 	    r->action == PF_BINAT || r->action == PF_RDR)) {
 		printf(" -> ");
 		print_pool(&r->rpool, r->rpool.proxy_port[0],
 		    r->rpool.proxy_port[1], r->af, r->action);
 	}
 }
 
 void
 print_tabledef(const char *name, int flags, int addrs,
     struct node_tinithead *nodes)
 {
 	struct node_tinit	*ti, *nti;
 	struct node_host	*h;
 
 	printf("table <%s>", name);
 	if (flags & PFR_TFLAG_CONST)
 		printf(" const");
 	if (flags & PFR_TFLAG_PERSIST)
 		printf(" persist");
 	if (flags & PFR_TFLAG_COUNTERS)
 		printf(" counters");
 	SIMPLEQ_FOREACH(ti, nodes, entries) {
 		if (ti->file) {
 			printf(" file \"%s\"", ti->file);
 			continue;
 		}
 		printf(" {");
 		for (;;) {
 			for (h = ti->host; h != NULL; h = h->next) {
 				printf(h->not ? " !" : " ");
 				print_addr(&h->addr, h->af, 0);
 			}
 			nti = SIMPLEQ_NEXT(ti, entries);
 			if (nti != NULL && nti->file == NULL)
 				ti = nti;	/* merge lists */
 			else
 				break;
 		}
 		printf(" }");
 	}
 	if (addrs && SIMPLEQ_EMPTY(nodes))
 		printf(" { }");
 	printf("\n");
 }
 
 int
 parse_flags(char *s)
 {
 	char		*p, *q;
 	u_int8_t	 f = 0;
 
 	for (p = s; *p; p++) {
 		if ((q = strchr(tcpflags, *p)) == NULL)
 			return -1;
 		else
 			f |= 1 << (q - tcpflags);
 	}
 	return (f ? f : PF_TH_ALL);
 }
 
 void
 set_ipmask(struct node_host *h, u_int8_t b)
 {
 	struct pf_addr	*m, *n;
 	int		 i, j = 0;
 
 	m = &h->addr.v.a.mask;
 	memset(m, 0, sizeof(*m));
 
 	while (b >= 32) {
 		m->addr32[j++] = 0xffffffff;
 		b -= 32;
 	}
 	for (i = 31; i > 31-b; --i)
 		m->addr32[j] |= (1 << i);
 	if (b)
 		m->addr32[j] = htonl(m->addr32[j]);
 
 	/* Mask off bits of the address that will never be used. */
 	n = &h->addr.v.a.addr;
 	if (h->addr.type == PF_ADDR_ADDRMASK)
 		for (i = 0; i < 4; i++)
 			n->addr32[i] = n->addr32[i] & m->addr32[i];
 }
 
 int
 check_netmask(struct node_host *h, sa_family_t af)
 {
 	struct node_host	*n = NULL;
 	struct pf_addr	*m;
 
 	for (n = h; n != NULL; n = n->next) {
 		if (h->addr.type == PF_ADDR_TABLE)
 			continue;
 		m = &h->addr.v.a.mask;
 		/* fix up netmask for dynaddr */
 		if (af == AF_INET && h->addr.type == PF_ADDR_DYNIFTL &&
 		    unmask(m, AF_INET6) > 32)
 			set_ipmask(n, 32);
 		/* netmasks > 32 bit are invalid on v4 */
 		if (af == AF_INET &&
 		    (m->addr32[1] || m->addr32[2] || m->addr32[3])) {
 			fprintf(stderr, "netmask %u invalid for IPv4 address\n",
 			    unmask(m, AF_INET6));
 			return (1);
 		}
 	}
 	return (0);
 }
 
 /* interface lookup routines */
 
-struct node_host	*iftab;
+static struct node_host	*iftab;
 
 void
 ifa_load(void)
 {
 	struct ifaddrs		*ifap, *ifa;
 	struct node_host	*n = NULL, *h = NULL;
 
 	if (getifaddrs(&ifap) < 0)
 		err(1, "getifaddrs");
 
 	for (ifa = ifap; ifa; ifa = ifa->ifa_next) {
 		if (!(ifa->ifa_addr->sa_family == AF_INET ||
 		    ifa->ifa_addr->sa_family == AF_INET6 ||
 		    ifa->ifa_addr->sa_family == AF_LINK))
 				continue;
 		n = calloc(1, sizeof(struct node_host));
 		if (n == NULL)
 			err(1, "address: calloc");
 		n->af = ifa->ifa_addr->sa_family;
 		n->ifa_flags = ifa->ifa_flags;
 #ifdef __KAME__
 		if (n->af == AF_INET6 &&
 		    IN6_IS_ADDR_LINKLOCAL(&((struct sockaddr_in6 *)
 		    ifa->ifa_addr)->sin6_addr) &&
 		    ((struct sockaddr_in6 *)ifa->ifa_addr)->sin6_scope_id ==
 		    0) {
 			struct sockaddr_in6	*sin6;
 
 			sin6 = (struct sockaddr_in6 *)ifa->ifa_addr;
 			sin6->sin6_scope_id = sin6->sin6_addr.s6_addr[2] << 8 |
 			    sin6->sin6_addr.s6_addr[3];
 			sin6->sin6_addr.s6_addr[2] = 0;
 			sin6->sin6_addr.s6_addr[3] = 0;
 		}
 #endif
 		n->ifindex = 0;
 		if (n->af == AF_INET) {
 			memcpy(&n->addr.v.a.addr, &((struct sockaddr_in *)
 			    ifa->ifa_addr)->sin_addr.s_addr,
 			    sizeof(struct in_addr));
 			memcpy(&n->addr.v.a.mask, &((struct sockaddr_in *)
 			    ifa->ifa_netmask)->sin_addr.s_addr,
 			    sizeof(struct in_addr));
 			if (ifa->ifa_broadaddr != NULL)
 				memcpy(&n->bcast, &((struct sockaddr_in *)
 				    ifa->ifa_broadaddr)->sin_addr.s_addr,
 				    sizeof(struct in_addr));
 			if (ifa->ifa_dstaddr != NULL)
 				memcpy(&n->peer, &((struct sockaddr_in *)
 				    ifa->ifa_dstaddr)->sin_addr.s_addr,
 				    sizeof(struct in_addr));
 		} else if (n->af == AF_INET6) {
 			memcpy(&n->addr.v.a.addr, &((struct sockaddr_in6 *)
 			    ifa->ifa_addr)->sin6_addr.s6_addr,
 			    sizeof(struct in6_addr));
 			memcpy(&n->addr.v.a.mask, &((struct sockaddr_in6 *)
 			    ifa->ifa_netmask)->sin6_addr.s6_addr,
 			    sizeof(struct in6_addr));
 			if (ifa->ifa_broadaddr != NULL)
 				memcpy(&n->bcast, &((struct sockaddr_in6 *)
 				    ifa->ifa_broadaddr)->sin6_addr.s6_addr,
 				    sizeof(struct in6_addr));
 			if (ifa->ifa_dstaddr != NULL)
 				 memcpy(&n->peer, &((struct sockaddr_in6 *)
 				    ifa->ifa_dstaddr)->sin6_addr.s6_addr,
 				    sizeof(struct in6_addr));
 			n->ifindex = ((struct sockaddr_in6 *)
 			    ifa->ifa_addr)->sin6_scope_id;
 		}
 		if ((n->ifname = strdup(ifa->ifa_name)) == NULL)
 			err(1, "ifa_load: strdup");
 		n->next = NULL;
 		n->tail = n;
 		if (h == NULL)
 			h = n;
 		else {
 			h->tail->next = n;
 			h->tail = n;
 		}
 	}
 
 	iftab = h;
 	freeifaddrs(ifap);
 }
 
 int
 get_socket_domain(void)
 {
 	int sdom;
 
 	sdom = AF_UNSPEC;
 #ifdef WITH_INET6
 	if (sdom == AF_UNSPEC && feature_present("inet6"))
 		sdom = AF_INET6;
 #endif
 #ifdef WITH_INET
 	if (sdom == AF_UNSPEC && feature_present("inet"))
 		sdom = AF_INET;
 #endif
 	if (sdom == AF_UNSPEC)
 		sdom = AF_LINK;
 
 	return (sdom);
 }
 
 struct node_host *
 ifa_exists(const char *ifa_name)
 {
 	struct node_host	*n;
 	struct ifgroupreq	ifgr;
 	int			s;
 
 	if (iftab == NULL)
 		ifa_load();
 
 	/* check wether this is a group */
 	if ((s = socket(get_socket_domain(), SOCK_DGRAM, 0)) == -1)
 		err(1, "socket");
 	bzero(&ifgr, sizeof(ifgr));
 	strlcpy(ifgr.ifgr_name, ifa_name, sizeof(ifgr.ifgr_name));
 	if (ioctl(s, SIOCGIFGMEMB, (caddr_t)&ifgr) == 0) {
 		/* fake a node_host */
 		if ((n = calloc(1, sizeof(*n))) == NULL)
 			err(1, "calloc");
 		if ((n->ifname = strdup(ifa_name)) == NULL)
 			err(1, "strdup");
 		close(s);
 		return (n);
 	}
 	close(s);
 
 	for (n = iftab; n; n = n->next) {
 		if (n->af == AF_LINK && !strncmp(n->ifname, ifa_name, IFNAMSIZ))
 			return (n);
 	}
 
 	return (NULL);
 }
 
 struct node_host *
 ifa_grouplookup(const char *ifa_name, int flags)
 {
 	struct ifg_req		*ifg;
 	struct ifgroupreq	 ifgr;
 	int			 s, len;
 	struct node_host	*n, *h = NULL;
 
 	if ((s = socket(get_socket_domain(), SOCK_DGRAM, 0)) == -1)
 		err(1, "socket");
 	bzero(&ifgr, sizeof(ifgr));
 	strlcpy(ifgr.ifgr_name, ifa_name, sizeof(ifgr.ifgr_name));
 	if (ioctl(s, SIOCGIFGMEMB, (caddr_t)&ifgr) == -1) {
 		close(s);
 		return (NULL);
 	}
 
 	len = ifgr.ifgr_len;
 	if ((ifgr.ifgr_groups = calloc(1, len)) == NULL)
 		err(1, "calloc");
 	if (ioctl(s, SIOCGIFGMEMB, (caddr_t)&ifgr) == -1)
 		err(1, "SIOCGIFGMEMB");
 
 	for (ifg = ifgr.ifgr_groups; ifg && len >= sizeof(struct ifg_req);
 	    ifg++) {
 		len -= sizeof(struct ifg_req);
 		if ((n = ifa_lookup(ifg->ifgrq_member, flags)) == NULL)
 			continue;
 		if (h == NULL)
 			h = n;
 		else {
 			h->tail->next = n;
 			h->tail = n->tail;
 		}
 	}
 	free(ifgr.ifgr_groups);
 	close(s);
 
 	return (h);
 }
 
 struct node_host *
 ifa_lookup(const char *ifa_name, int flags)
 {
 	struct node_host	*p = NULL, *h = NULL, *n = NULL;
 	int			 got4 = 0, got6 = 0;
 	const char		 *last_if = NULL;
 
 	if ((h = ifa_grouplookup(ifa_name, flags)) != NULL)
 		return (h);
 
 	if (!strncmp(ifa_name, "self", IFNAMSIZ))
 		ifa_name = NULL;
 
 	if (iftab == NULL)
 		ifa_load();
 
 	for (p = iftab; p; p = p->next) {
 		if (ifa_skip_if(ifa_name, p))
 			continue;
 		if ((flags & PFI_AFLAG_BROADCAST) && p->af != AF_INET)
 			continue;
 		if ((flags & PFI_AFLAG_BROADCAST) &&
 		    !(p->ifa_flags & IFF_BROADCAST))
 			continue;
 		if ((flags & PFI_AFLAG_PEER) &&
 		    !(p->ifa_flags & IFF_POINTOPOINT))
 			continue;
 		if ((flags & PFI_AFLAG_NETWORK) && p->ifindex > 0)
 			continue;
 		if (last_if == NULL || strcmp(last_if, p->ifname))
 			got4 = got6 = 0;
 		last_if = p->ifname;
 		if ((flags & PFI_AFLAG_NOALIAS) && p->af == AF_INET && got4)
 			continue;
 		if ((flags & PFI_AFLAG_NOALIAS) && p->af == AF_INET6 && got6)
 			continue;
 		if (p->af == AF_INET)
 			got4 = 1;
 		else
 			got6 = 1;
 		n = calloc(1, sizeof(struct node_host));
 		if (n == NULL)
 			err(1, "address: calloc");
 		n->af = p->af;
 		if (flags & PFI_AFLAG_BROADCAST)
 			memcpy(&n->addr.v.a.addr, &p->bcast,
 			    sizeof(struct pf_addr));
 		else if (flags & PFI_AFLAG_PEER)
 			memcpy(&n->addr.v.a.addr, &p->peer,
 			    sizeof(struct pf_addr));
 		else
 			memcpy(&n->addr.v.a.addr, &p->addr.v.a.addr,
 			    sizeof(struct pf_addr));
 		if (flags & PFI_AFLAG_NETWORK)
 			set_ipmask(n, unmask(&p->addr.v.a.mask, n->af));
 		else {
 			if (n->af == AF_INET) {
 				if (p->ifa_flags & IFF_LOOPBACK &&
 				    p->ifa_flags & IFF_LINK1)
 					memcpy(&n->addr.v.a.mask,
 					    &p->addr.v.a.mask,
 					    sizeof(struct pf_addr));
 				else
 					set_ipmask(n, 32);
 			} else
 				set_ipmask(n, 128);
 		}
 		n->ifindex = p->ifindex;
 
 		n->next = NULL;
 		n->tail = n;
 		if (h == NULL)
 			h = n;
 		else {
 			h->tail->next = n;
 			h->tail = n;
 		}
 	}
 	return (h);
 }
 
 int
 ifa_skip_if(const char *filter, struct node_host *p)
 {
 	int	n;
 
 	if (p->af != AF_INET && p->af != AF_INET6)
 		return (1);
 	if (filter == NULL || !*filter)
 		return (0);
 	if (!strcmp(p->ifname, filter))
 		return (0);	/* exact match */
 	n = strlen(filter);
 	if (n < 1 || n >= IFNAMSIZ)
 		return (1);	/* sanity check */
 	if (filter[n-1] >= '0' && filter[n-1] <= '9')
 		return (1);	/* only do exact match in that case */
 	if (strncmp(p->ifname, filter, n))
 		return (1);	/* prefix doesn't match */
 	return (p->ifname[n] < '0' || p->ifname[n] > '9');
 }
 
 
 struct node_host *
 host(const char *s)
 {
 	struct node_host	*h = NULL;
 	int			 mask, v4mask, v6mask, cont = 1;
 	char			*p, *q, *ps;
 
 	if ((p = strrchr(s, '/')) != NULL) {
 		mask = strtol(p+1, &q, 0);
 		if (!q || *q || mask > 128 || q == (p+1)) {
 			fprintf(stderr, "invalid netmask '%s'\n", p);
 			return (NULL);
 		}
 		if ((ps = malloc(strlen(s) - strlen(p) + 1)) == NULL)
 			err(1, "host: malloc");
 		strlcpy(ps, s, strlen(s) - strlen(p) + 1);
 		v4mask = v6mask = mask;
 	} else {
 		if ((ps = strdup(s)) == NULL)
 			err(1, "host: strdup");
 		v4mask = 32;
 		v6mask = 128;
 		mask = -1;
 	}
 
 	/* interface with this name exists? */
 	if (cont && (h = host_if(ps, mask)) != NULL)
 		cont = 0;
 
 	/* IPv4 address? */
 	if (cont && (h = host_v4(s, mask)) != NULL)
 		cont = 0;
 
 	/* IPv6 address? */
 	if (cont && (h = host_v6(ps, v6mask)) != NULL)
 		cont = 0;
 
 	/* dns lookup */
 	if (cont && (h = host_dns(ps, v4mask, v6mask)) != NULL)
 		cont = 0;
 	free(ps);
 
 	if (h == NULL || cont == 1) {
 		fprintf(stderr, "no IP address found for %s\n", s);
 		return (NULL);
 	}
 	return (h);
 }
 
 struct node_host *
 host_if(const char *s, int mask)
 {
 	struct node_host	*n, *h = NULL;
 	char			*p, *ps;
 	int			 flags = 0;
 
 	if ((ps = strdup(s)) == NULL)
 		err(1, "host_if: strdup");
 	while ((p = strrchr(ps, ':')) != NULL) {
 		if (!strcmp(p+1, "network"))
 			flags |= PFI_AFLAG_NETWORK;
 		else if (!strcmp(p+1, "broadcast"))
 			flags |= PFI_AFLAG_BROADCAST;
 		else if (!strcmp(p+1, "peer"))
 			flags |= PFI_AFLAG_PEER;
 		else if (!strcmp(p+1, "0"))
 			flags |= PFI_AFLAG_NOALIAS;
 		else {
 			free(ps);
 			return (NULL);
 		}
 		*p = '\0';
 	}
 	if (flags & (flags - 1) & PFI_AFLAG_MODEMASK) { /* Yep! */
 		fprintf(stderr, "illegal combination of interface modifiers\n");
 		free(ps);
 		return (NULL);
 	}
 	if ((flags & (PFI_AFLAG_NETWORK|PFI_AFLAG_BROADCAST)) && mask > -1) {
 		fprintf(stderr, "network or broadcast lookup, but "
 		    "extra netmask given\n");
 		free(ps);
 		return (NULL);
 	}
 	if (ifa_exists(ps) || !strncmp(ps, "self", IFNAMSIZ)) {
 		/* interface with this name exists */
 		h = ifa_lookup(ps, flags);
 		for (n = h; n != NULL && mask > -1; n = n->next)
 			set_ipmask(n, mask);
 	}
 
 	free(ps);
 	return (h);
 }
 
 struct node_host *
 host_v4(const char *s, int mask)
 {
 	struct node_host	*h = NULL;
 	struct in_addr		 ina;
 	int			 bits = 32;
 
 	memset(&ina, 0, sizeof(struct in_addr));
 	if (strrchr(s, '/') != NULL) {
 		if ((bits = inet_net_pton(AF_INET, s, &ina, sizeof(ina))) == -1)
 			return (NULL);
 	} else {
 		if (inet_pton(AF_INET, s, &ina) != 1)
 			return (NULL);
 	}
 
 	h = calloc(1, sizeof(struct node_host));
 	if (h == NULL)
 		err(1, "address: calloc");
 	h->ifname = NULL;
 	h->af = AF_INET;
 	h->addr.v.a.addr.addr32[0] = ina.s_addr;
 	set_ipmask(h, bits);
 	h->next = NULL;
 	h->tail = h;
 
 	return (h);
 }
 
 struct node_host *
 host_v6(const char *s, int mask)
 {
 	struct addrinfo		 hints, *res;
 	struct node_host	*h = NULL;
 
 	memset(&hints, 0, sizeof(hints));
 	hints.ai_family = AF_INET6;
 	hints.ai_socktype = SOCK_DGRAM; /*dummy*/
 	hints.ai_flags = AI_NUMERICHOST;
 	if (getaddrinfo(s, "0", &hints, &res) == 0) {
 		h = calloc(1, sizeof(struct node_host));
 		if (h == NULL)
 			err(1, "address: calloc");
 		h->ifname = NULL;
 		h->af = AF_INET6;
 		memcpy(&h->addr.v.a.addr,
 		    &((struct sockaddr_in6 *)res->ai_addr)->sin6_addr,
 		    sizeof(h->addr.v.a.addr));
 		h->ifindex =
 		    ((struct sockaddr_in6 *)res->ai_addr)->sin6_scope_id;
 		set_ipmask(h, mask);
 		freeaddrinfo(res);
 		h->next = NULL;
 		h->tail = h;
 	}
 
 	return (h);
 }
 
 struct node_host *
 host_dns(const char *s, int v4mask, int v6mask)
 {
 	struct addrinfo		 hints, *res0, *res;
 	struct node_host	*n, *h = NULL;
 	int			 error, noalias = 0;
 	int			 got4 = 0, got6 = 0;
 	char			*p, *ps;
 
 	if ((ps = strdup(s)) == NULL)
 		err(1, "host_dns: strdup");
 	if ((p = strrchr(ps, ':')) != NULL && !strcmp(p, ":0")) {
 		noalias = 1;
 		*p = '\0';
 	}
 	memset(&hints, 0, sizeof(hints));
 	hints.ai_family = PF_UNSPEC;
 	hints.ai_socktype = SOCK_STREAM; /* DUMMY */
 	error = getaddrinfo(ps, NULL, &hints, &res0);
 	if (error) {
 		free(ps);
 		return (h);
 	}
 
 	for (res = res0; res; res = res->ai_next) {
 		if (res->ai_family != AF_INET &&
 		    res->ai_family != AF_INET6)
 			continue;
 		if (noalias) {
 			if (res->ai_family == AF_INET) {
 				if (got4)
 					continue;
 				got4 = 1;
 			} else {
 				if (got6)
 					continue;
 				got6 = 1;
 			}
 		}
 		n = calloc(1, sizeof(struct node_host));
 		if (n == NULL)
 			err(1, "host_dns: calloc");
 		n->ifname = NULL;
 		n->af = res->ai_family;
 		if (res->ai_family == AF_INET) {
 			memcpy(&n->addr.v.a.addr,
 			    &((struct sockaddr_in *)
 			    res->ai_addr)->sin_addr.s_addr,
 			    sizeof(struct in_addr));
 			set_ipmask(n, v4mask);
 		} else {
 			memcpy(&n->addr.v.a.addr,
 			    &((struct sockaddr_in6 *)
 			    res->ai_addr)->sin6_addr.s6_addr,
 			    sizeof(struct in6_addr));
 			n->ifindex =
 			    ((struct sockaddr_in6 *)
 			    res->ai_addr)->sin6_scope_id;
 			set_ipmask(n, v6mask);
 		}
 		n->next = NULL;
 		n->tail = n;
 		if (h == NULL)
 			h = n;
 		else {
 			h->tail->next = n;
 			h->tail = n;
 		}
 	}
 	freeaddrinfo(res0);
 	free(ps);
 
 	return (h);
 }
 
 /*
  * convert a hostname to a list of addresses and put them in the given buffer.
  * test:
  *	if set to 1, only simple addresses are accepted (no netblock, no "!").
  */
 int
 append_addr(struct pfr_buffer *b, char *s, int test)
 {
 	char			 *r;
 	struct node_host	*h, *n;
 	int			 rv, not = 0;
 
 	for (r = s; *r == '!'; r++)
 		not = !not;
 	if ((n = host(r)) == NULL) {
 		errno = 0;
 		return (-1);
 	}
 	rv = append_addr_host(b, n, test, not);
 	do {
 		h = n;
 		n = n->next;
 		free(h);
 	} while (n != NULL);
 	return (rv);
 }
 
 /*
  * same as previous function, but with a pre-parsed input and the ability
  * to "negate" the result. Does not free the node_host list.
  * not:
  *      setting it to 1 is equivalent to adding "!" in front of parameter s.
  */
 int
 append_addr_host(struct pfr_buffer *b, struct node_host *n, int test, int not)
 {
 	int			 bits;
 	struct pfr_addr		 addr;
 
 	do {
 		bzero(&addr, sizeof(addr));
 		addr.pfra_not = n->not ^ not;
 		addr.pfra_af = n->af;
 		addr.pfra_net = unmask(&n->addr.v.a.mask, n->af);
 		switch (n->af) {
 		case AF_INET:
 			addr.pfra_ip4addr.s_addr = n->addr.v.a.addr.addr32[0];
 			bits = 32;
 			break;
 		case AF_INET6:
 			memcpy(&addr.pfra_ip6addr, &n->addr.v.a.addr.v6,
 			    sizeof(struct in6_addr));
 			bits = 128;
 			break;
 		default:
 			errno = EINVAL;
 			return (-1);
 		}
 		if ((test && (not || addr.pfra_net != bits)) ||
 		    addr.pfra_net > bits) {
 			errno = EINVAL;
 			return (-1);
 		}
 		if (pfr_buf_add(b, &addr))
 			return (-1);
 	} while ((n = n->next) != NULL);
 
 	return (0);
 }
 
 int
 pfctl_add_trans(struct pfr_buffer *buf, int rs_num, const char *anchor)
 {
 	struct pfioc_trans_e trans;
 
 	bzero(&trans, sizeof(trans));
 	trans.rs_num = rs_num;
 	if (strlcpy(trans.anchor, anchor,
 	    sizeof(trans.anchor)) >= sizeof(trans.anchor))
 		errx(1, "pfctl_add_trans: strlcpy");
 
 	return pfr_buf_add(buf, &trans);
 }
 
 u_int32_t
 pfctl_get_ticket(struct pfr_buffer *buf, int rs_num, const char *anchor)
 {
 	struct pfioc_trans_e *p;
 
 	PFRB_FOREACH(p, buf)
 		if (rs_num == p->rs_num && !strcmp(anchor, p->anchor))
 			return (p->ticket);
 	errx(1, "pfctl_get_ticket: assertion failed");
 }
 
 int
 pfctl_trans(int dev, struct pfr_buffer *buf, u_long cmd, int from)
 {
 	struct pfioc_trans trans;
 
 	bzero(&trans, sizeof(trans));
 	trans.size = buf->pfrb_size - from;
 	trans.esize = sizeof(struct pfioc_trans_e);
 	trans.array = ((struct pfioc_trans_e *)buf->pfrb_caddr) + from;
 	return ioctl(dev, cmd, &trans);
 }
Index: user/alc/PQ_LAUNDRY/share/man/man9/Makefile
===================================================================
--- user/alc/PQ_LAUNDRY/share/man/man9/Makefile	(revision 303774)
+++ user/alc/PQ_LAUNDRY/share/man/man9/Makefile	(revision 303775)
@@ -1,1944 +1,1943 @@
 # $FreeBSD$
 
 .include <src.opts.mk>
 
 PACKAGE=runtime-manuals
 
 MAN=	accept_filter.9 \
 	accf_data.9 \
 	accf_dns.9 \
 	accf_http.9 \
 	acl.9 \
 	alq.9 \
 	altq.9 \
 	atomic.9 \
 	bios.9 \
 	bitset.9 \
 	boot.9 \
 	bpf.9 \
 	buf.9 \
 	buf_ring.9 \
 	BUF_ISLOCKED.9 \
 	BUF_LOCK.9 \
 	BUF_LOCKFREE.9 \
 	BUF_LOCKINIT.9 \
 	BUF_RECURSED.9 \
 	BUF_TIMELOCK.9 \
 	BUF_UNLOCK.9 \
 	bus_activate_resource.9 \
 	BUS_ADD_CHILD.9 \
 	bus_adjust_resource.9 \
 	bus_alloc_resource.9 \
 	BUS_BIND_INTR.9 \
 	bus_child_present.9 \
 	BUS_CHILD_DELETED.9 \
 	BUS_CHILD_DETACHED.9 \
 	BUS_CONFIG_INTR.9 \
 	BUS_DESCRIBE_INTR.9 \
 	bus_dma.9 \
 	bus_generic_attach.9 \
 	bus_generic_detach.9 \
 	bus_generic_new_pass.9 \
 	bus_generic_print_child.9 \
 	bus_generic_read_ivar.9 \
 	bus_generic_shutdown.9 \
 	BUS_GET_CPUS.9 \
 	bus_get_resource.9 \
 	bus_map_resource.9 \
 	BUS_NEW_PASS.9 \
 	BUS_PRINT_CHILD.9 \
 	BUS_READ_IVAR.9 \
 	BUS_RESCAN.9 \
 	bus_release_resource.9 \
 	bus_set_pass.9 \
 	bus_set_resource.9 \
 	BUS_SETUP_INTR.9 \
 	bus_space.9 \
 	byteorder.9 \
 	casuword.9 \
 	cd.9 \
 	condvar.9 \
 	config_intrhook.9 \
 	contigmalloc.9 \
 	copy.9 \
 	counter.9 \
 	cpuset.9 \
 	cr_cansee.9 \
 	critical_enter.9 \
 	cr_seeothergids.9 \
 	cr_seeotheruids.9 \
 	crypto.9 \
 	CTASSERT.9 \
 	DB_COMMAND.9 \
 	DECLARE_GEOM_CLASS.9 \
 	DECLARE_MODULE.9 \
 	DELAY.9 \
 	devclass.9 \
 	devclass_find.9 \
 	devclass_get_device.9 \
 	devclass_get_devices.9 \
 	devclass_get_drivers.9 \
 	devclass_get_maxunit.9 \
 	devclass_get_name.9 \
 	devclass_get_softc.9 \
 	dev_clone.9 \
 	devfs_set_cdevpriv.9 \
 	device.9 \
 	device_add_child.9 \
 	DEVICE_ATTACH.9 \
 	device_delete_child.9 \
 	DEVICE_DETACH.9 \
 	device_enable.9 \
 	device_find_child.9 \
 	device_get_children.9 \
 	device_get_devclass.9 \
 	device_get_driver.9 \
 	device_get_ivars.9 \
 	device_get_name.9 \
 	device_get_parent.9 \
 	device_get_softc.9 \
 	device_get_state.9 \
 	device_get_sysctl.9 \
 	device_get_unit.9 \
 	DEVICE_IDENTIFY.9 \
 	device_printf.9 \
 	DEVICE_PROBE.9 \
 	device_probe_and_attach.9 \
 	device_quiet.9 \
 	device_set_desc.9 \
 	device_set_driver.9 \
 	device_set_flags.9 \
 	DEVICE_SHUTDOWN.9 \
 	DEV_MODULE.9 \
 	devstat.9 \
 	devtoname.9 \
 	disk.9 \
 	domain.9 \
 	drbr.9 \
 	driver.9 \
 	DRIVER_MODULE.9 \
 	EVENTHANDLER.9 \
 	eventtimers.9 \
 	extattr.9 \
 	fail.9 \
 	fetch.9 \
 	firmware.9 \
 	fpu_kern.9 \
 	g_access.9 \
 	g_attach.9 \
 	g_bio.9 \
 	g_consumer.9 \
 	g_data.9 \
 	get_cyclecount.9 \
 	getenv.9 \
 	getnewvnode.9 \
 	g_event.9 \
 	g_geom.9 \
 	g_provider.9 \
 	g_provider_by_name.9 \
 	groupmember.9 \
 	g_wither_geom.9 \
 	hash.9 \
 	hashinit.9 \
 	hexdump.9 \
 	hhook.9 \
 	ieee80211.9 \
 	ieee80211_amrr.9 \
 	ieee80211_beacon.9 \
 	ieee80211_bmiss.9 \
 	ieee80211_crypto.9 \
 	ieee80211_ddb.9 \
 	ieee80211_input.9 \
 	ieee80211_node.9 \
 	ieee80211_output.9 \
 	ieee80211_proto.9 \
 	ieee80211_radiotap.9 \
 	ieee80211_regdomain.9 \
 	ieee80211_scan.9 \
 	ieee80211_vap.9 \
 	ifnet.9 \
 	inittodr.9 \
 	insmntque.9 \
 	intro.9 \
 	ithread.9 \
 	KASSERT.9 \
 	kern_testfrwk.9 \
 	kernacc.9 \
 	kernel_mount.9 \
 	khelp.9 \
 	kobj.9 \
 	kproc.9 \
 	kqueue.9 \
 	kthread.9 \
 	ktr.9 \
 	lock.9 \
 	locking.9 \
 	LOCK_PROFILING.9 \
 	mac.9 \
 	make_dev.9 \
 	malloc.9 \
 	mbchain.9 \
 	mbpool.9 \
 	mbuf.9 \
 	mbuf_tags.9 \
 	MD5.9 \
 	mdchain.9 \
 	memcchr.9 \
 	memguard.9 \
 	microseq.9 \
 	microtime.9 \
 	microuptime.9 \
 	mi_switch.9 \
 	mod_cc.9 \
 	module.9 \
 	MODULE_DEPEND.9 \
 	MODULE_VERSION.9 \
 	mtx_pool.9 \
 	mutex.9 \
 	namei.9 \
 	netisr.9 \
 	nv.9 \
 	osd.9 \
 	owll.9 \
 	own.9 \
 	panic.9 \
 	pbuf.9 \
 	PCBGROUP.9 \
 	p_candebug.9 \
 	p_cansee.9 \
 	pci.9 \
 	PCI_IOV_ADD_VF.9 \
 	PCI_IOV_INIT.9 \
 	pci_iov_schema.9 \
 	PCI_IOV_UNINIT.9 \
 	pfil.9 \
 	pfind.9 \
 	pget.9 \
 	pgfind.9 \
 	PHOLD.9 \
 	physio.9 \
 	pmap.9 \
 	pmap_activate.9 \
 	pmap_clear_modify.9 \
 	pmap_copy.9 \
 	pmap_enter.9 \
 	pmap_extract.9 \
 	pmap_growkernel.9 \
 	pmap_init.9 \
 	pmap_is_modified.9 \
 	pmap_is_prefaultable.9 \
 	pmap_map.9 \
 	pmap_mincore.9 \
 	pmap_object_init_pt.9 \
 	pmap_page_exists_quick.9 \
 	pmap_page_init.9 \
 	pmap_pinit.9 \
 	pmap_protect.9 \
 	pmap_qenter.9 \
 	pmap_quick_enter_page.9 \
 	pmap_release.9 \
 	pmap_remove.9 \
 	pmap_resident_count.9 \
 	pmap_unwire.9 \
 	pmap_zero_page.9 \
 	printf.9 \
 	prison_check.9 \
 	priv.9 \
 	proc_rwmem.9 \
 	pseudofs.9 \
 	psignal.9 \
 	random.9 \
 	random_harvest.9 \
 	redzone.9 \
 	refcount.9 \
 	resettodr.9 \
 	resource_int_value.9 \
 	rijndael.9 \
 	rman.9 \
 	rmlock.9 \
 	rtalloc.9 \
 	rtentry.9 \
 	runqueue.9 \
 	rwlock.9 \
 	sbuf.9 \
 	scheduler.9 \
 	SDT.9 \
 	securelevel_gt.9 \
 	selrecord.9 \
 	sema.9 \
 	sf_buf.9 \
 	sglist.9 \
 	shm_map.9 \
 	signal.9 \
 	sleep.9 \
 	sleepqueue.9 \
 	socket.9 \
 	stack.9 \
 	store.9 \
 	style.9 \
 	swi.9 \
 	sx.9 \
 	SYSCALL_MODULE.9 \
 	sysctl.9 \
 	sysctl_add_oid.9 \
 	sysctl_ctx_init.9 \
 	SYSINIT.9 \
 	taskqueue.9 \
 	tcp_functions.9 \
 	thread_exit.9 \
 	time.9 \
 	timeout.9 \
 	tvtohz.9 \
 	ucred.9 \
 	uidinfo.9 \
 	uio.9 \
 	unr.9 \
 	utopia.9 \
 	vaccess.9 \
 	vaccess_acl_nfs4.9 \
 	vaccess_acl_posix1e.9 \
 	vcount.9 \
 	vflush.9 \
 	VFS.9 \
 	vfs_busy.9 \
 	VFS_CHECKEXP.9 \
 	vfsconf.9 \
 	VFS_FHTOVP.9 \
 	vfs_getnewfsid.9 \
 	vfs_getopt.9 \
 	vfs_getvfs.9 \
 	VFS_MOUNT.9 \
 	vfs_mountedfrom.9 \
 	VFS_QUOTACTL.9 \
 	VFS_ROOT.9 \
 	vfs_rootmountalloc.9 \
 	VFS_SET.9 \
 	VFS_STATFS.9 \
 	vfs_suser.9 \
 	VFS_SYNC.9 \
 	vfs_timestamp.9 \
 	vfs_unbusy.9 \
 	VFS_UNMOUNT.9 \
 	vfs_unmountall.9 \
 	VFS_VGET.9 \
 	vget.9 \
 	vgone.9 \
 	vhold.9 \
 	vinvalbuf.9 \
 	vm_fault_prefault.9 \
 	vm_map.9 \
 	vm_map_check_protection.9 \
 	vm_map_create.9 \
 	vm_map_delete.9 \
 	vm_map_entry_resize_free.9 \
 	vm_map_find.9 \
 	vm_map_findspace.9 \
 	vm_map_inherit.9 \
 	vm_map_init.9 \
 	vm_map_insert.9 \
 	vm_map_lock.9 \
 	vm_map_lookup.9 \
 	vm_map_madvise.9 \
 	vm_map_max.9 \
 	vm_map_protect.9 \
 	vm_map_remove.9 \
 	vm_map_simplify_entry.9 \
 	vm_map_stack.9 \
 	vm_map_submap.9 \
 	vm_map_sync.9 \
 	vm_map_wire.9 \
 	vm_page_alloc.9 \
 	vm_page_bits.9 \
 	vm_page_busy.9 \
 	vm_page_cache.9 \
 	vm_page_deactivate.9 \
 	vm_page_dontneed.9 \
 	vm_page_aflag.9 \
 	vm_page_free.9 \
 	vm_page_grab.9 \
 	vm_page_hold.9 \
 	vm_page_insert.9 \
 	vm_page_lookup.9 \
 	vm_page_rename.9 \
 	vm_page_wire.9 \
 	vm_set_page_size.9 \
 	vmem.9 \
 	vn_fullpath.9 \
 	vn_isdisk.9 \
 	vnet.9 \
 	vnode.9 \
 	VOP_ACCESS.9 \
 	VOP_ACLCHECK.9 \
 	VOP_ADVISE.9 \
 	VOP_ADVLOCK.9 \
 	VOP_ALLOCATE.9 \
 	VOP_ATTRIB.9 \
 	VOP_BWRITE.9 \
 	VOP_CREATE.9 \
 	VOP_FSYNC.9 \
 	VOP_GETACL.9 \
 	VOP_GETEXTATTR.9 \
 	VOP_GETPAGES.9 \
 	VOP_INACTIVE.9 \
 	VOP_IOCTL.9 \
 	VOP_LINK.9 \
 	VOP_LISTEXTATTR.9 \
 	VOP_LOCK.9 \
 	VOP_LOOKUP.9 \
 	VOP_OPENCLOSE.9 \
 	VOP_PATHCONF.9 \
 	VOP_PRINT.9 \
 	VOP_RDWR.9 \
 	VOP_READDIR.9 \
 	VOP_READLINK.9 \
 	VOP_REALLOCBLKS.9 \
 	VOP_REMOVE.9 \
 	VOP_RENAME.9 \
 	VOP_REVOKE.9 \
 	VOP_SETACL.9 \
 	VOP_SETEXTATTR.9 \
 	VOP_STRATEGY.9 \
 	VOP_VPTOCNP.9 \
 	VOP_VPTOFH.9 \
 	vref.9 \
 	vrefcnt.9 \
 	vrele.9 \
 	vslock.9 \
 	watchdog.9 \
 	zone.9
 
 MLINKS=	unr.9 alloc_unr.9 \
 	unr.9 alloc_unrl.9 \
 	unr.9 alloc_unr_specific.9 \
 	unr.9 delete_unrhdr.9 \
 	unr.9 free_unr.9 \
 	unr.9 new_unrhdr.9
 MLINKS+=accept_filter.9 accept_filt_add.9 \
 	accept_filter.9 accept_filt_del.9 \
 	accept_filter.9 accept_filt_generic_mod_event.9 \
 	accept_filter.9 accept_filt_get.9
 MLINKS+=alq.9 ALQ.9 \
 	alq.9 alq_close.9 \
 	alq.9 alq_flush.9 \
 	alq.9 alq_get.9 \
 	alq.9 alq_getn.9 \
 	alq.9 alq_open.9 \
 	alq.9 alq_open_flags.9 \
 	alq.9 alq_post.9 \
 	alq.9 alq_post_flags.9 \
 	alq.9 alq_write.9 \
 	alq.9 alq_writen.9
 MLINKS+=altq.9 ALTQ.9
 MLINKS+=atomic.9 atomic_add.9 \
 	atomic.9 atomic_clear.9 \
 	atomic.9 atomic_cmpset.9 \
 	atomic.9 atomic_fetchadd.9 \
 	atomic.9 atomic_load.9 \
 	atomic.9 atomic_readandclear.9 \
 	atomic.9 atomic_set.9 \
 	atomic.9 atomic_store.9 \
 	atomic.9 atomic_subtract.9 \
 	atomic.9 atomic_swap.9 \
 	atomic.9 atomic_testandset.9
 MLINKS+=bitset.9 BITSET_DEFINE.9 \
 	bitset.9 BITSET_T_INITIALIZER.9 \
 	bitset.9 BITSET_FSET.9 \
 	bitset.9 BIT_CLR.9 \
 	bitset.9 BIT_COPY.9 \
 	bitset.9 BIT_ISSET.9 \
 	bitset.9 BIT_SET.9 \
 	bitset.9 BIT_ZERO.9 \
 	bitset.9 BIT_FILL.9 \
 	bitset.9 BIT_SETOF.9 \
 	bitset.9 BIT_EMPTY.9 \
 	bitset.9 BIT_ISFULLSET.9 \
 	bitset.9 BIT_FFS.9 \
 	bitset.9 BIT_COUNT.9 \
 	bitset.9 BIT_SUBSET.9 \
 	bitset.9 BIT_OVERLAP.9 \
 	bitset.9 BIT_CMP.9 \
 	bitset.9 BIT_OR.9 \
 	bitset.9 BIT_AND.9 \
 	bitset.9 BIT_NAND.9 \
 	bitset.9 BIT_CLR_ATOMIC.9 \
 	bitset.9 BIT_SET_ATOMIC.9 \
 	bitset.9 BIT_SET_ATOMIC_ACQ.9 \
 	bitset.9 BIT_AND_ATOMIC.9 \
 	bitset.9 BIT_OR_ATOMIC.9 \
 	bitset.9 BIT_COPY_STORE_REL.9
 MLINKS+=bpf.9 bpfattach.9 \
 	bpf.9 bpfattach2.9 \
 	bpf.9 bpfdetach.9 \
 	bpf.9 bpf_filter.9 \
 	bpf.9 bpf_mtap.9 \
 	bpf.9 bpf_mtap2.9 \
 	bpf.9 bpf_tap.9 \
 	bpf.9 bpf_validate.9
 MLINKS+=buf.9 bp.9
 MLINKS+=buf_ring.9 buf_ring_alloc.9 \
 	buf_ring.9 buf_ring_free.9 \
 	buf_ring.9 buf_ring_enqueue.9 \
 	buf_ring.9 buf_ring_enqueue_bytes.9 \
 	buf_ring.9 buf_ring_dequeue_mc.9 \
 	buf_ring.9 buf_ring_dequeue_sc.9 \
 	buf_ring.9 buf_ring_count.9 \
 	buf_ring.9 buf_ring_empty.9 \
 	buf_ring.9 buf_ring_full.9 \
 	buf_ring.9 buf_ring_peek.9
 MLINKS+=bus_activate_resource.9 bus_deactivate_resource.9
 MLINKS+=bus_alloc_resource.9 bus_alloc_resource_any.9
 MLINKS+=BUS_BIND_INTR.9 bus_bind_intr.9
 MLINKS+=BUS_DESCRIBE_INTR.9 bus_describe_intr.9
 MLINKS+=bus_dma.9 busdma.9 \
 	bus_dma.9 bus_dmamap_create.9 \
 	bus_dma.9 bus_dmamap_destroy.9 \
 	bus_dma.9 bus_dmamap_load.9 \
 	bus_dma.9 bus_dmamap_load_bio.9 \
 	bus_dma.9 bus_dmamap_load_ccb.9 \
 	bus_dma.9 bus_dmamap_load_mbuf.9 \
 	bus_dma.9 bus_dmamap_load_mbuf_sg.9 \
 	bus_dma.9 bus_dmamap_load_uio.9 \
 	bus_dma.9 bus_dmamap_sync.9 \
 	bus_dma.9 bus_dmamap_unload.9 \
 	bus_dma.9 bus_dmamem_alloc.9 \
 	bus_dma.9 bus_dmamem_free.9 \
 	bus_dma.9 bus_dma_tag_create.9 \
 	bus_dma.9 bus_dma_tag_destroy.9
 MLINKS+=bus_generic_read_ivar.9 bus_generic_write_ivar.9
 MLINKS+=BUS_GET_CPUS.9 bus_get_cpus.9
 MLINKS+=bus_map_resource.9 bus_unmap_resource.9 \
 	bus_map_resource.9 resource_init_map_request.9
 MLINKS+=BUS_READ_IVAR.9 BUS_WRITE_IVAR.9
 MLINKS+=BUS_SETUP_INTR.9 bus_setup_intr.9 \
 	BUS_SETUP_INTR.9 BUS_TEARDOWN_INTR.9 \
 	BUS_SETUP_INTR.9 bus_teardown_intr.9
 MLINKS+=bus_space.9 bus_space_alloc.9 \
 	bus_space.9 bus_space_barrier.9 \
 	bus_space.9 bus_space_copy_region_1.9 \
 	bus_space.9 bus_space_copy_region_2.9 \
 	bus_space.9 bus_space_copy_region_4.9 \
 	bus_space.9 bus_space_copy_region_8.9 \
 	bus_space.9 bus_space_copy_region_stream_1.9 \
 	bus_space.9 bus_space_copy_region_stream_2.9 \
 	bus_space.9 bus_space_copy_region_stream_4.9 \
 	bus_space.9 bus_space_copy_region_stream_8.9 \
 	bus_space.9 bus_space_free.9 \
 	bus_space.9 bus_space_map.9 \
 	bus_space.9 bus_space_read_1.9 \
 	bus_space.9 bus_space_read_2.9 \
 	bus_space.9 bus_space_read_4.9 \
 	bus_space.9 bus_space_read_8.9 \
 	bus_space.9 bus_space_read_multi_1.9 \
 	bus_space.9 bus_space_read_multi_2.9 \
 	bus_space.9 bus_space_read_multi_4.9 \
 	bus_space.9 bus_space_read_multi_8.9 \
 	bus_space.9 bus_space_read_multi_stream_1.9 \
 	bus_space.9 bus_space_read_multi_stream_2.9 \
 	bus_space.9 bus_space_read_multi_stream_4.9 \
 	bus_space.9 bus_space_read_multi_stream_8.9 \
 	bus_space.9 bus_space_read_region_1.9 \
 	bus_space.9 bus_space_read_region_2.9 \
 	bus_space.9 bus_space_read_region_4.9 \
 	bus_space.9 bus_space_read_region_8.9 \
 	bus_space.9 bus_space_read_region_stream_1.9 \
 	bus_space.9 bus_space_read_region_stream_2.9 \
 	bus_space.9 bus_space_read_region_stream_4.9 \
 	bus_space.9 bus_space_read_region_stream_8.9 \
 	bus_space.9 bus_space_read_stream_1.9 \
 	bus_space.9 bus_space_read_stream_2.9 \
 	bus_space.9 bus_space_read_stream_4.9 \
 	bus_space.9 bus_space_read_stream_8.9 \
 	bus_space.9 bus_space_set_multi_1.9 \
 	bus_space.9 bus_space_set_multi_2.9 \
 	bus_space.9 bus_space_set_multi_4.9 \
 	bus_space.9 bus_space_set_multi_8.9 \
 	bus_space.9 bus_space_set_multi_stream_1.9 \
 	bus_space.9 bus_space_set_multi_stream_2.9 \
 	bus_space.9 bus_space_set_multi_stream_4.9 \
 	bus_space.9 bus_space_set_multi_stream_8.9 \
 	bus_space.9 bus_space_set_region_1.9 \
 	bus_space.9 bus_space_set_region_2.9 \
 	bus_space.9 bus_space_set_region_4.9 \
 	bus_space.9 bus_space_set_region_8.9 \
 	bus_space.9 bus_space_set_region_stream_1.9 \
 	bus_space.9 bus_space_set_region_stream_2.9 \
 	bus_space.9 bus_space_set_region_stream_4.9 \
 	bus_space.9 bus_space_set_region_stream_8.9 \
 	bus_space.9 bus_space_subregion.9 \
 	bus_space.9 bus_space_unmap.9 \
 	bus_space.9 bus_space_write_1.9 \
 	bus_space.9 bus_space_write_2.9 \
 	bus_space.9 bus_space_write_4.9 \
 	bus_space.9 bus_space_write_8.9 \
 	bus_space.9 bus_space_write_multi_1.9 \
 	bus_space.9 bus_space_write_multi_2.9 \
 	bus_space.9 bus_space_write_multi_4.9 \
 	bus_space.9 bus_space_write_multi_8.9 \
 	bus_space.9 bus_space_write_multi_stream_1.9 \
 	bus_space.9 bus_space_write_multi_stream_2.9 \
 	bus_space.9 bus_space_write_multi_stream_4.9 \
 	bus_space.9 bus_space_write_multi_stream_8.9 \
 	bus_space.9 bus_space_write_region_1.9 \
 	bus_space.9 bus_space_write_region_2.9 \
 	bus_space.9 bus_space_write_region_4.9 \
 	bus_space.9 bus_space_write_region_8.9 \
 	bus_space.9 bus_space_write_region_stream_1.9 \
 	bus_space.9 bus_space_write_region_stream_2.9 \
 	bus_space.9 bus_space_write_region_stream_4.9 \
 	bus_space.9 bus_space_write_region_stream_8.9 \
 	bus_space.9 bus_space_write_stream_1.9 \
 	bus_space.9 bus_space_write_stream_2.9 \
 	bus_space.9 bus_space_write_stream_4.9 \
 	bus_space.9 bus_space_write_stream_8.9
 MLINKS+=byteorder.9 be16dec.9 \
 	byteorder.9 be16enc.9 \
 	byteorder.9 be16toh.9 \
 	byteorder.9 be32dec.9 \
 	byteorder.9 be32enc.9 \
 	byteorder.9 be32toh.9 \
 	byteorder.9 be64dec.9 \
 	byteorder.9 be64enc.9 \
 	byteorder.9 be64toh.9 \
 	byteorder.9 bswap16.9 \
 	byteorder.9 bswap32.9 \
 	byteorder.9 bswap64.9 \
 	byteorder.9 htobe16.9 \
 	byteorder.9 htobe32.9 \
 	byteorder.9 htobe64.9 \
 	byteorder.9 htole16.9 \
 	byteorder.9 htole32.9 \
 	byteorder.9 htole64.9 \
 	byteorder.9 le16dec.9 \
 	byteorder.9 le16enc.9 \
 	byteorder.9 le16toh.9 \
 	byteorder.9 le32dec.9 \
 	byteorder.9 le32enc.9 \
 	byteorder.9 le32toh.9 \
 	byteorder.9 le64dec.9 \
 	byteorder.9 le64enc.9 \
 	byteorder.9 le64toh.9
 MLINKS+=condvar.9 cv_broadcast.9 \
 	condvar.9 cv_broadcastpri.9 \
 	condvar.9 cv_destroy.9 \
 	condvar.9 cv_init.9 \
 	condvar.9 cv_signal.9 \
 	condvar.9 cv_timedwait.9 \
 	condvar.9 cv_timedwait_sig.9 \
 	condvar.9 cv_timedwait_sig_sbt.9 \
 	condvar.9 cv_wait.9 \
 	condvar.9 cv_wait_sig.9 \
 	condvar.9 cv_wait_unlock.9 \
 	condvar.9 cv_wmesg.9
 MLINKS+=config_intrhook.9 config_intrhook_disestablish.9 \
 	config_intrhook.9 config_intrhook_establish.9
 MLINKS+=contigmalloc.9 contigfree.9
 MLINKS+=casuword.9 casueword.9 \
 	casuword.9 casueword32.9 \
 	casuword.9 casuword32.9
 MLINKS+=copy.9 copyin.9 \
 	copy.9 copyin_nofault.9 \
 	copy.9 copyinstr.9 \
 	copy.9 copyout.9 \
 	copy.9 copyout_nofault.9 \
 	copy.9 copystr.9
 MLINKS+=counter.9 counter_u64_alloc.9 \
 	counter.9 counter_u64_free.9 \
 	counter.9 counter_u64_add.9 \
 	counter.9 counter_enter.9 \
 	counter.9 counter_exit.9 \
 	counter.9 counter_u64_add_protected.9 \
 	counter.9 counter_u64_fetch.9 \
 	counter.9 counter_u64_zero.9 \
 	counter.9 SYSCTL_COUNTER_U64.9 \
 	counter.9 SYSCTL_ADD_COUNTER_U64.9 \
 	counter.9 SYSCTL_COUNTER_U64_ARRAY.9 \
 	counter.9 SYSCTL_ADD_COUNTER_U64_ARRAY.9
 MLINKS+=cpuset.9 CPUSET_T_INITIALIZER.9 \
 	cpuset.9 CPUSET_FSET.9 \
 	cpuset.9 CPU_CLR.9 \
 	cpuset.9 CPU_COPY.9 \
 	cpuset.9 CPU_ISSET.9 \
 	cpuset.9 CPU_SET.9 \
 	cpuset.9 CPU_ZERO.9 \
 	cpuset.9 CPU_FILL.9 \
 	cpuset.9 CPU_SETOF.9 \
 	cpuset.9 CPU_EMPTY.9 \
 	cpuset.9 CPU_ISFULLSET.9 \
 	cpuset.9 CPU_FFS.9 \
 	cpuset.9 CPU_COUNT.9 \
 	cpuset.9 CPU_SUBSET.9 \
 	cpuset.9 CPU_OVERLAP.9 \
 	cpuset.9 CPU_CMP.9 \
 	cpuset.9 CPU_OR.9 \
 	cpuset.9 CPU_AND.9 \
 	cpuset.9 CPU_NAND.9 \
 	cpuset.9 CPU_CLR_ATOMIC.9 \
 	cpuset.9 CPU_SET_ATOMIC.9 \
 	cpuset.9 CPU_SET_ATOMIC_ACQ.9 \
 	cpuset.9 CPU_AND_ATOMIC.9 \
 	cpuset.9 CPU_OR_ATOMIC.9 \
 	cpuset.9 CPU_COPY_STORE_REL.9
 MLINKS+=critical_enter.9 critical.9 \
 	critical_enter.9 critical_exit.9
 MLINKS+=crypto.9 crypto_dispatch.9 \
 	crypto.9 crypto_done.9 \
 	crypto.9 crypto_freereq.9 \
 	crypto.9 crypto_freesession.9 \
 	crypto.9 crypto_get_driverid.9 \
 	crypto.9 crypto_getreq.9 \
 	crypto.9 crypto_kdispatch.9 \
 	crypto.9 crypto_kdone.9 \
 	crypto.9 crypto_kregister.9 \
 	crypto.9 crypto_newsession.9 \
 	crypto.9 crypto_register.9 \
 	crypto.9 crypto_unblock.9 \
 	crypto.9 crypto_unregister.9 \
 	crypto.9 crypto_unregister_all.9
 MLINKS+=DB_COMMAND.9 DB_SHOW_ALL_COMMAND.9 \
 	DB_COMMAND.9 DB_SHOW_COMMAND.9
 MLINKS+=dev_clone.9 drain_dev_clone_events.9
 MLINKS+=devfs_set_cdevpriv.9 devfs_clear_cdevpriv.9 \
 	devfs_set_cdevpriv.9 devfs_get_cdevpriv.9
 MLINKS+=device_add_child.9 device_add_child_ordered.9
 MLINKS+=device_enable.9 device_disable.9 \
 	device_enable.9 device_is_enabled.9
 MLINKS+=device_get_ivars.9 device_set_ivars.9
 MLINKS+=device_get_name.9 device_get_nameunit.9
 MLINKS+=device_get_state.9 device_busy.9 \
 	device_get_state.9 device_is_alive.9 \
 	device_get_state.9 device_is_attached.9 \
 	device_get_state.9 device_unbusy.9
 MLINKS+=device_get_sysctl.9 device_get_sysctl_ctx.9 \
 	device_get_sysctl.9 device_get_sysctl_tree.9
 MLINKS+=device_quiet.9 device_is_quiet.9 \
 	device_quiet.9 device_verbose.9
 MLINKS+=device_set_desc.9 device_get_desc.9 \
 	device_set_desc.9 device_set_desc_copy.9
 MLINKS+=device_set_flags.9 device_get_flags.9
 MLINKS+=devstat.9 devicestat.9 \
 	devstat.9 devstat_add_entry.9 \
 	devstat.9 devstat_end_transaction.9 \
 	devstat.9 devstat_remove_entry.9 \
 	devstat.9 devstat_start_transaction.9
 MLINKS+=disk.9 disk_alloc.9 \
 	disk.9 disk_create.9 \
 	disk.9 disk_destroy.9 \
 	disk.9 disk_gone.9 \
 	disk.9 disk_resize.9
 MLINKS+=domain.9 DOMAIN_SET.9 \
 	domain.9 domain_add.9 \
 	domain.9 pfctlinput.9 \
 	domain.9 pfctlinput2.9 \
 	domain.9 pffinddomain.9 \
 	domain.9 pffindproto.9 \
 	domain.9 pffindtype.9
 MLINKS+=drbr.9 drbr_free.9 \
 	drbr.9 drbr_enqueue.9 \
 	drbr.9 drbr_dequeue.9 \
 	drbr.9 drbr_dequeue_cond.9 \
 	drbr.9 drbr_flush.9 \
 	drbr.9 drbr_empty.9 \
 	drbr.9 drbr_inuse.9 \
 	drbr.9 drbr_stats_update.9
 MLINKS+=DRIVER_MODULE.9 DRIVER_MODULE_ORDERED.9 \
 	DRIVER_MODULE.9 EARLY_DRIVER_MODULE.9 \
 	DRIVER_MODULE.9 EARLY_DRIVER_MODULE_ORDERED.9
 MLINKS+=EVENTHANDLER.9 EVENTHANDLER_DECLARE.9 \
 	EVENTHANDLER.9 EVENTHANDLER_DEREGISTER.9 \
 	EVENTHANDLER.9 eventhandler_deregister.9 \
 	EVENTHANDLER.9 eventhandler_find_list.9 \
 	EVENTHANDLER.9 EVENTHANDLER_INVOKE.9 \
 	EVENTHANDLER.9 eventhandler_prune_list.9 \
 	EVENTHANDLER.9 EVENTHANDLER_REGISTER.9 \
 	EVENTHANDLER.9 eventhandler_register.9
 MLINKS+=eventtimers.9 et_register.9 \
 	eventtimers.9 et_deregister.9 \
 	eventtimers.9 et_ban.9 \
 	eventtimers.9 et_find.9 \
 	eventtimers.9 et_free.9 \
 	eventtimers.9 et_init.9 \
 	eventtimers.9 ET_LOCK.9 \
 	eventtimers.9 ET_UNLOCK.9 \
 	eventtimers.9 et_start.9 \
 	eventtimers.9 et_stop.9
 MLINKS+=fail.9 KFAIL_POINT_CODE.9 \
 	fail.9 KFAIL_POINT_ERROR.9 \
 	fail.9 KFAIL_POINT_GOTO.9 \
 	fail.9 KFAIL_POINT_RETURN.9 \
 	fail.9 KFAIL_POINT_RETURN_VOID.9
 MLINKS+=fetch.9 fubyte.9 \
 	fetch.9 fuswintr.9 \
 	fetch.9 fuword.9 \
 	fetch.9 fuword16.9 \
 	fetch.9 fuword32.9 \
 	fetch.9 fuword64.9 \
 	fetch.9 fueword.9 \
 	fetch.9 fueword32.9 \
 	fetch.9 fueword64.9
 MLINKS+=firmware.9 firmware_get.9 \
 	firmware.9 firmware_put.9 \
 	firmware.9 firmware_register.9 \
 	firmware.9 firmware_unregister.9
 MLINKS+=fpu_kern.9 fpu_kern_alloc_ctx.9 \
 	fpu_kern.9 fpu_kern_free_ctx.9 \
 	fpu_kern.9 fpu_kern_enter.9 \
 	fpu_kern.9 fpu_kern_leave.9 \
 	fpu_kern.9 fpu_kern_thread.9 \
 	fpu_kern.9 is_fpu_kern_thread.9
 MLINKS+=g_attach.9 g_detach.9
 MLINKS+=g_bio.9 g_alloc_bio.9 \
 	g_bio.9 g_clone_bio.9 \
 	g_bio.9 g_destroy_bio.9 \
 	g_bio.9 g_duplicate_bio.9 \
 	g_bio.9 g_new_bio.9 \
 	g_bio.9 g_print_bio.9 \
 	g_bio.9 g_reset_bio.9
 MLINKS+=g_consumer.9 g_destroy_consumer.9 \
 	g_consumer.9 g_new_consumer.9
 MLINKS+=g_data.9 g_read_data.9 \
 	g_data.9 g_write_data.9
 MLINKS+=getenv.9 freeenv.9 \
 	getenv.9 getenv_int.9 \
 	getenv.9 getenv_long.9 \
 	getenv.9 getenv_string.9 \
 	getenv.9 getenv_quad.9 \
 	getenv.9 getenv_uint.9 \
 	getenv.9 getenv_ulong.9 \
 	getenv.9 setenv.9 \
 	getenv.9 testenv.9 \
 	getenv.9 unsetenv.9
 MLINKS+=g_event.9 g_cancel_event.9 \
 	g_event.9 g_post_event.9 \
 	g_event.9 g_waitfor_event.9
 MLINKS+=g_geom.9 g_destroy_geom.9 \
 	g_geom.9 g_new_geomf.9
 MLINKS+=g_provider.9 g_destroy_provider.9 \
 	g_provider.9 g_error_provider.9 \
 	g_provider.9 g_new_providerf.9
 MLINKS+=hash.9 hash32.9 \
 	hash.9 hash32_buf.9 \
 	hash.9 hash32_str.9 \
 	hash.9 hash32_stre.9 \
 	hash.9 hash32_strn.9 \
 	hash.9 hash32_strne.9 \
 	hash.9 jenkins_hash.9 \
 	hash.9 jenkins_hash32.9
 MLINKS+=hashinit.9 hashdestroy.9 \
 	hashinit.9 hashinit_flags.9 \
 	hashinit.9 phashinit.9
 MLINKS+=hhook.9 hhook_head_register.9 \
 	hhook.9 hhook_head_deregister.9 \
 	hhook.9 hhook_head_deregister_lookup.9 \
 	hhook.9 hhook_run_hooks.9 \
 	hhook.9 HHOOKS_RUN_IF.9 \
 	hhook.9 HHOOKS_RUN_LOOKUP_IF.9
 MLINKS+=ieee80211.9 ieee80211_ifattach.9 \
 	ieee80211.9 ieee80211_ifdetach.9
 MLINKS+=ieee80211_amrr.9 ieee80211_amrr_choose.9 \
 	ieee80211_amrr.9 ieee80211_amrr_cleanup.9 \
 	ieee80211_amrr.9 ieee80211_amrr_init.9 \
 	ieee80211_amrr.9 ieee80211_amrr_node_init.9 \
 	ieee80211_amrr.9 ieee80211_amrr_setinterval.9 \
 	ieee80211_amrr.9 ieee80211_amrr_tx_complete.9 \
 	ieee80211_amrr.9 ieee80211_amrr_tx_update.9
 MLINKS+=ieee80211_beacon.9 ieee80211_beacon_alloc.9 \
 	ieee80211_beacon.9 ieee80211_beacon_notify.9 \
 	ieee80211_beacon.9 ieee80211_beacon_update.9
 MLINKS+=ieee80211_bmiss.9 ieee80211_beacon_miss.9
 MLINKS+=ieee80211_crypto.9 ieee80211_crypto_available.9 \
 	ieee80211_crypto.9 ieee80211_crypto_decap.9 \
 	ieee80211_crypto.9 ieee80211_crypto_delglobalkeys.9 \
 	ieee80211_crypto.9 ieee80211_crypto_delkey.9 \
 	ieee80211_crypto.9 ieee80211_crypto_demic.9 \
 	ieee80211_crypto.9 ieee80211_crypto_encap.9 \
 	ieee80211_crypto.9 ieee80211_crypto_enmic.9 \
 	ieee80211_crypto.9 ieee80211_crypto_newkey.9 \
 	ieee80211_crypto.9 ieee80211_crypto_register.9 \
 	ieee80211_crypto.9 ieee80211_crypto_reload_keys.9 \
 	ieee80211_crypto.9 ieee80211_crypto_setkey.9 \
 	ieee80211_crypto.9 ieee80211_crypto_unregister.9 \
 	ieee80211_crypto.9 ieee80211_key_update_begin.9 \
 	ieee80211_crypto.9 ieee80211_key_update_end.9 \
 	ieee80211_crypto.9 ieee80211_notify_michael_failure.9 \
 	ieee80211_crypto.9 ieee80211_notify_replay_failure.9
 MLINKS+=ieee80211_input.9 ieee80211_input_all.9
 MLINKS+=ieee80211_node.9 ieee80211_dump_node.9 \
 	ieee80211_node.9 ieee80211_dump_nodes.9 \
 	ieee80211_node.9 ieee80211_find_rxnode.9 \
 	ieee80211_node.9 ieee80211_find_rxnode_withkey.9 \
 	ieee80211_node.9 ieee80211_free_node.9 \
 	ieee80211_node.9 ieee80211_iterate_nodes.9 \
 	ieee80211_node.9 ieee80211_ref_node.9 \
 	ieee80211_node.9 ieee80211_unref_node.9
 MLINKS+=ieee80211_output.9 ieee80211_process_callback.9 \
 	ieee80211_output.9 M_SEQNO_GET.9 \
 	ieee80211_output.9 M_WME_GETAC.9
 MLINKS+=ieee80211_proto.9 ieee80211_new_state.9 \
 	ieee80211_proto.9 ieee80211_resume_all.9 \
 	ieee80211_proto.9 ieee80211_start_all.9 \
 	ieee80211_proto.9 ieee80211_stop_all.9 \
 	ieee80211_proto.9 ieee80211_suspend_all.9 \
 	ieee80211_proto.9 ieee80211_waitfor_parent.9
 MLINKS+=ieee80211_radiotap.9 ieee80211_radiotap_active.9 \
 	ieee80211_radiotap.9 ieee80211_radiotap_active_vap.9 \
 	ieee80211_radiotap.9 ieee80211_radiotap_attach.9 \
 	ieee80211_radiotap.9 ieee80211_radiotap_tx.9 \
 	ieee80211_radiotap.9 radiotap.9
 MLINKS+=ieee80211_regdomain.9 ieee80211_alloc_countryie.9 \
 	ieee80211_regdomain.9 ieee80211_init_channels.9 \
 	ieee80211_regdomain.9 ieee80211_sort_channels.9
 MLINKS+=ieee80211_scan.9 ieee80211_add_scan.9 \
 	ieee80211_scan.9 ieee80211_bg_scan.9 \
 	ieee80211_scan.9 ieee80211_cancel_scan.9 \
 	ieee80211_scan.9 ieee80211_cancel_scan_any.9 \
 	ieee80211_scan.9 ieee80211_check_scan.9 \
 	ieee80211_scan.9 ieee80211_check_scan_current.9 \
 	ieee80211_scan.9 ieee80211_flush.9 \
 	ieee80211_scan.9 ieee80211_probe_curchan.9 \
 	ieee80211_scan.9 ieee80211_scan_assoc_fail.9 \
 	ieee80211_scan.9 ieee80211_scan_done.9 \
 	ieee80211_scan.9 ieee80211_scan_dump_channels.9 \
 	ieee80211_scan.9 ieee80211_scan_flush.9 \
 	ieee80211_scan.9 ieee80211_scan_iterate.9 \
 	ieee80211_scan.9 ieee80211_scan_next.9 \
 	ieee80211_scan.9 ieee80211_scan_timeout.9 \
 	ieee80211_scan.9 ieee80211_scanner_get.9 \
 	ieee80211_scan.9 ieee80211_scanner_register.9 \
 	ieee80211_scan.9 ieee80211_scanner_unregister.9 \
 	ieee80211_scan.9 ieee80211_scanner_unregister_all.9 \
 	ieee80211_scan.9 ieee80211_start_scan.9
 MLINKS+=ieee80211_vap.9 ieee80211_vap_attach.9 \
 	ieee80211_vap.9 ieee80211_vap_detach.9 \
 	ieee80211_vap.9 ieee80211_vap_setup.9
 MLINKS+=ifnet.9 if_addmulti.9 \
 	ifnet.9 if_alloc.9 \
 	ifnet.9 if_allmulti.9 \
 	ifnet.9 if_attach.9 \
 	ifnet.9 if_data.9 \
 	ifnet.9 IF_DEQUEUE.9 \
 	ifnet.9 if_delmulti.9 \
 	ifnet.9 if_detach.9 \
 	ifnet.9 if_down.9 \
 	ifnet.9 if_findmulti.9 \
 	ifnet.9 if_free.9 \
 	ifnet.9 if_free_type.9 \
 	ifnet.9 if_up.9 \
 	ifnet.9 ifa_free.9 \
 	ifnet.9 ifa_ifwithaddr.9 \
 	ifnet.9 ifa_ifwithdstaddr.9 \
 	ifnet.9 ifa_ifwithnet.9 \
 	ifnet.9 ifa_ref.9 \
 	ifnet.9 ifaddr.9 \
 	ifnet.9 ifaddr_byindex.9 \
 	ifnet.9 ifaof_ifpforaddr.9 \
 	ifnet.9 ifioctl.9 \
 	ifnet.9 ifpromisc.9 \
 	ifnet.9 ifqueue.9 \
 	ifnet.9 ifunit.9 \
 	ifnet.9 ifunit_ref.9
 MLINKS+=insmntque.9 insmntque1.9
 MLINKS+=ithread.9 ithread_add_handler.9 \
 	ithread.9 ithread_create.9 \
 	ithread.9 ithread_destroy.9 \
 	ithread.9 ithread_priority.9 \
 	ithread.9 ithread_remove_handler.9 \
 	ithread.9 ithread_schedule.9
 MLINKS+=kernacc.9 useracc.9
 MLINKS+=kernel_mount.9 free_mntarg.9 \
 	kernel_mount.9 kernel_vmount.9 \
 	kernel_mount.9 mount_arg.9 \
 	kernel_mount.9 mount_argb.9 \
 	kernel_mount.9 mount_argf.9 \
 	kernel_mount.9 mount_argsu.9
 MLINKS+=khelp.9 khelp_add_hhook.9 \
 	khelp.9 KHELP_DECLARE_MOD.9 \
 	khelp.9 KHELP_DECLARE_MOD_UMA.9 \
 	khelp.9 khelp_destroy_osd.9 \
 	khelp.9 khelp_get_id.9 \
 	khelp.9 khelp_get_osd.9 \
 	khelp.9 khelp_init_osd.9 \
 	khelp.9 khelp_remove_hhook.9
 MLINKS+=kobj.9 DEFINE_CLASS.9 \
 	kobj.9 kobj_class_compile.9 \
 	kobj.9 kobj_class_compile_static.9 \
 	kobj.9 kobj_class_free.9 \
 	kobj.9 kobj_create.9 \
 	kobj.9 kobj_delete.9 \
 	kobj.9 kobj_init.9 \
 	kobj.9 kobj_init_static.9
 MLINKS+=kproc.9 kproc_create.9 \
 	kproc.9 kproc_exit.9 \
 	kproc.9 kproc_kthread_add.9 \
 	kproc.9 kproc_resume.9 \
 	kproc.9 kproc_shutdown.9 \
 	kproc.9 kproc_start.9 \
 	kproc.9 kproc_suspend.9 \
 	kproc.9 kproc_suspend_check.9 \
 	kproc.9 kthread_create.9
 MLINKS+=kqueue.9 knlist_add.9 \
 	kqueue.9 knlist_clear.9 \
 	kqueue.9 knlist_delete.9 \
 	kqueue.9 knlist_destroy.9 \
 	kqueue.9 knlist_empty.9 \
 	kqueue.9 knlist_init.9 \
 	kqueue.9 knlist_init_mtx.9 \
 	kqueue.9 knlist_init_rw_reader.9 \
 	kqueue.9 knlist_remove.9 \
 	kqueue.9 knlist_remove_inevent.9 \
 	kqueue.9 knote_fdclose.9 \
 	kqueue.9 KNOTE_LOCKED.9 \
 	kqueue.9 KNOTE_UNLOCKED.9 \
 	kqueue.9 kqfd_register.9 \
 	kqueue.9 kqueue_add_filteropts.9 \
 	kqueue.9 kqueue_del_filteropts.9
 MLINKS+=kthread.9 kthread_add.9 \
 	kthread.9 kthread_exit.9 \
 	kthread.9 kthread_resume.9 \
 	kthread.9 kthread_shutdown.9 \
 	kthread.9 kthread_start.9 \
 	kthread.9 kthread_suspend.9 \
 	kthread.9 kthread_suspend_check.9
 MLINKS+=ktr.9 CTR0.9 \
 	ktr.9 CTR1.9 \
 	ktr.9 CTR2.9 \
 	ktr.9 CTR3.9 \
 	ktr.9 CTR4.9 \
 	ktr.9 CTR5.9 \
 	ktr.9 CTR6.9
 MLINKS+=lock.9 lockdestroy.9 \
 	lock.9 lockinit.9 \
 	lock.9 lockmgr.9 \
 	lock.9 lockmgr_args.9 \
 	lock.9 lockmgr_args_rw.9 \
 	lock.9 lockmgr_assert.9 \
 	lock.9 lockmgr_disown.9 \
 	lock.9 lockmgr_printinfo.9 \
 	lock.9 lockmgr_recursed.9 \
 	lock.9 lockmgr_rw.9 \
-	lock.9 lockmgr_waiters.9 \
 	lock.9 lockstatus.9
 MLINKS+=LOCK_PROFILING.9 MUTEX_PROFILING.9
 MLINKS+=make_dev.9 destroy_dev.9 \
 	make_dev.9 destroy_dev_drain.9 \
 	make_dev.9 destroy_dev_sched.9 \
 	make_dev.9 destroy_dev_sched_cb.9 \
 	make_dev.9 dev_depends.9 \
 	make_dev.9 make_dev_alias.9 \
 	make_dev.9 make_dev_alias_p.9 \
 	make_dev.9 make_dev_cred.9 \
 	make_dev.9 make_dev_credf.9 \
 	make_dev.9 make_dev_p.9 \
 	make_dev.9 make_dev_s.9
 MLINKS+=malloc.9 free.9 \
 	malloc.9 MALLOC_DECLARE.9 \
 	malloc.9 MALLOC_DEFINE.9 \
 	malloc.9 realloc.9 \
 	malloc.9 reallocf.9
 MLINKS+=mbchain.9 mb_detach.9 \
 	mbchain.9 mb_done.9 \
 	mbchain.9 mb_fixhdr.9 \
 	mbchain.9 mb_init.9 \
 	mbchain.9 mb_initm.9 \
 	mbchain.9 mb_put_int64be.9 \
 	mbchain.9 mb_put_int64le.9 \
 	mbchain.9 mb_put_mbuf.9 \
 	mbchain.9 mb_put_mem.9 \
 	mbchain.9 mb_put_uint16be.9 \
 	mbchain.9 mb_put_uint16le.9 \
 	mbchain.9 mb_put_uint32be.9 \
 	mbchain.9 mb_put_uint32le.9 \
 	mbchain.9 mb_put_uint8.9 \
 	mbchain.9 mb_put_uio.9 \
 	mbchain.9 mb_reserve.9
 MLINKS+=mbpool.9 mbp_alloc.9 \
 	mbpool.9 mbp_card_free.9 \
 	mbpool.9 mbp_count.9 \
 	mbpool.9 mbp_create.9 \
 	mbpool.9 mbp_destroy.9 \
 	mbpool.9 mbp_ext_free.9 \
 	mbpool.9 mbp_free.9 \
 	mbpool.9 mbp_get.9 \
 	mbpool.9 mbp_get_keep.9 \
 	mbpool.9 mbp_sync.9
 MLINKS+=\
 	mbuf.9 m_adj.9 \
 	mbuf.9 m_align.9 \
 	mbuf.9 M_ALIGN.9 \
 	mbuf.9 m_append.9 \
 	mbuf.9 m_apply.9 \
 	mbuf.9 m_cat.9 \
 	mbuf.9 m_catpkt.9 \
 	mbuf.9 MCHTYPE.9 \
 	mbuf.9 MCLGET.9 \
 	mbuf.9 m_collapse.9 \
 	mbuf.9 m_copyback.9 \
 	mbuf.9 m_copydata.9 \
 	mbuf.9 m_copym.9 \
 	mbuf.9 m_copypacket.9 \
 	mbuf.9 m_copyup.9 \
 	mbuf.9 m_defrag.9 \
 	mbuf.9 m_devget.9 \
 	mbuf.9 m_dup.9 \
 	mbuf.9 m_dup_pkthdr.9 \
 	mbuf.9 MEXTADD.9 \
 	mbuf.9 m_fixhdr.9 \
 	mbuf.9 m_free.9 \
 	mbuf.9 m_freem.9 \
 	mbuf.9 MGET.9 \
 	mbuf.9 m_get.9 \
 	mbuf.9 m_get2.9 \
 	mbuf.9 m_getjcl.9 \
 	mbuf.9 m_getcl.9 \
 	mbuf.9 m_getclr.9 \
 	mbuf.9 MGETHDR.9 \
 	mbuf.9 m_gethdr.9 \
 	mbuf.9 m_getm.9 \
 	mbuf.9 m_getptr.9 \
 	mbuf.9 MH_ALIGN.9 \
 	mbuf.9 M_LEADINGSPACE.9 \
 	mbuf.9 m_length.9 \
 	mbuf.9 M_MOVE_PKTHDR.9 \
 	mbuf.9 m_move_pkthdr.9 \
 	mbuf.9 M_PREPEND.9 \
 	mbuf.9 m_prepend.9 \
 	mbuf.9 m_pulldown.9 \
 	mbuf.9 m_pullup.9 \
 	mbuf.9 m_split.9 \
 	mbuf.9 mtod.9 \
 	mbuf.9 M_TRAILINGSPACE.9 \
 	mbuf.9 m_unshare.9 \
 	mbuf.9 M_WRITABLE.9
 MLINKS+=\
 	mbuf_tags.9 m_tag_alloc.9 \
 	mbuf_tags.9 m_tag_copy.9 \
 	mbuf_tags.9 m_tag_copy_chain.9 \
 	mbuf_tags.9 m_tag_delete.9 \
 	mbuf_tags.9 m_tag_delete_chain.9 \
 	mbuf_tags.9 m_tag_delete_nonpersistent.9 \
 	mbuf_tags.9 m_tag_find.9 \
 	mbuf_tags.9 m_tag_first.9 \
 	mbuf_tags.9 m_tag_free.9 \
 	mbuf_tags.9 m_tag_get.9 \
 	mbuf_tags.9 m_tag_init.9 \
 	mbuf_tags.9 m_tag_locate.9 \
 	mbuf_tags.9 m_tag_next.9 \
 	mbuf_tags.9 m_tag_prepend.9 \
 	mbuf_tags.9 m_tag_unlink.9
 MLINKS+=MD5.9 MD5Init.9 \
 	MD5.9 MD5Transform.9
 MLINKS+=mdchain.9 md_append_record.9 \
 	mdchain.9 md_done.9 \
 	mdchain.9 md_get_int64.9 \
 	mdchain.9 md_get_int64be.9 \
 	mdchain.9 md_get_int64le.9 \
 	mdchain.9 md_get_mbuf.9 \
 	mdchain.9 md_get_mem.9 \
 	mdchain.9 md_get_uint16.9 \
 	mdchain.9 md_get_uint16be.9 \
 	mdchain.9 md_get_uint16le.9 \
 	mdchain.9 md_get_uint32.9 \
 	mdchain.9 md_get_uint32be.9 \
 	mdchain.9 md_get_uint32le.9 \
 	mdchain.9 md_get_uint8.9 \
 	mdchain.9 md_get_uio.9 \
 	mdchain.9 md_initm.9 \
 	mdchain.9 md_next_record.9
 MLINKS+=microtime.9 bintime.9 \
 	microtime.9 getbintime.9 \
 	microtime.9 getmicrotime.9 \
 	microtime.9 getnanotime.9 \
 	microtime.9 nanotime.9
 MLINKS+=microuptime.9 binuptime.9 \
 	microuptime.9 getbinuptime.9 \
 	microuptime.9 getmicrouptime.9 \
 	microuptime.9 getnanouptime.9 \
 	microuptime.9 getsbinuptime.9 \
 	microuptime.9 nanouptime.9 \
 	microuptime.9 sbinuptime.9
 MLINKS+=mi_switch.9 cpu_switch.9 \
 	mi_switch.9 cpu_throw.9
 MLINKS+=mod_cc.9 CCV.9 \
 	mod_cc.9 DECLARE_CC_MODULE.9
 MLINKS+=mtx_pool.9 mtx_pool_alloc.9 \
 	mtx_pool.9 mtx_pool_create.9 \
 	mtx_pool.9 mtx_pool_destroy.9 \
 	mtx_pool.9 mtx_pool_find.9 \
 	mtx_pool.9 mtx_pool_lock.9 \
 	mtx_pool.9 mtx_pool_lock_spin.9 \
 	mtx_pool.9 mtx_pool_unlock.9 \
 	mtx_pool.9 mtx_pool_unlock_spin.9
 MLINKS+=mutex.9 mtx_assert.9 \
 	mutex.9 mtx_destroy.9 \
 	mutex.9 mtx_init.9 \
 	mutex.9 mtx_initialized.9 \
 	mutex.9 mtx_lock.9 \
 	mutex.9 mtx_lock_flags.9 \
 	mutex.9 mtx_lock_spin.9 \
 	mutex.9 mtx_lock_spin_flags.9 \
 	mutex.9 mtx_owned.9 \
 	mutex.9 mtx_recursed.9 \
 	mutex.9 mtx_sleep.9 \
 	mutex.9 MTX_SYSINIT.9 \
 	mutex.9 mtx_trylock.9 \
 	mutex.9 mtx_trylock_flags.9 \
 	mutex.9 mtx_trylock_spin.9 \
 	mutex.9 mtx_trylock_spin_flags.9 \
 	mutex.9 mtx_unlock.9 \
 	mutex.9 mtx_unlock_flags.9 \
 	mutex.9 mtx_unlock_spin.9 \
 	mutex.9 mtx_unlock_spin_flags.9
 MLINKS+=namei.9 NDFREE.9 \
 	namei.9 NDINIT.9
 MLINKS+=netisr.9 netisr_clearqdrops.9 \
 	netisr.9 netisr_default_flow2cpu.9 \
 	netisr.9 netisr_dispatch.9 \
 	netisr.9 netisr_dispatch_src.9 \
 	netisr.9 netisr_get_cpucount.9 \
 	netisr.9 netisr_get_cpuid.9 \
 	netisr.9 netisr_getqdrops.9 \
 	netisr.9 netisr_getqlimit.9 \
 	netisr.9 netisr_queue.9 \
 	netisr.9 netisr_queue_src.9 \
 	netisr.9 netisr_register.9 \
 	netisr.9 netisr_setqlimit.9 \
 	netisr.9 netisr_unregister.9
 MLINKS+=nv.9 libnv.9 \
 	nv.9 nvlist.9 \
 	nv.9 nvlist_add_binary.9 \
 	nv.9 nvlist_add_bool.9 \
 	nv.9 nvlist_add_descriptor.9 \
 	nv.9 nvlist_add_null.9 \
 	nv.9 nvlist_add_number.9 \
 	nv.9 nvlist_add_nvlist.9 \
 	nv.9 nvlist_add_string.9 \
 	nv.9 nvlist_add_stringf.9 \
 	nv.9 nvlist_add_stringv.9 \
 	nv.9 nvlist_clone.9 \
 	nv.9 nvlist_create.9 \
 	nv.9 nvlist_destroy.9 \
 	nv.9 nvlist_dump.9 \
 	nv.9 nvlist_empty.9 \
 	nv.9 nvlist_error.9 \
 	nv.9 nvlist_exists.9 \
 	nv.9 nvlist_exists_binary.9 \
 	nv.9 nvlist_exists_bool.9 \
 	nv.9 nvlist_exists_descriptor.9 \
 	nv.9 nvlist_exists_null.9 \
 	nv.9 nvlist_exists_number.9 \
 	nv.9 nvlist_exists_nvlist.9 \
 	nv.9 nvlist_exists_string.9 \
 	nv.9 nvlist_exists_type.9 \
 	nv.9 nvlist_fdump.9 \
 	nv.9 nvlist_flags.9 \
 	nv.9 nvlist_free.9 \
 	nv.9 nvlist_free_binary.9 \
 	nv.9 nvlist_free_bool.9 \
 	nv.9 nvlist_free_descriptor.9 \
 	nv.9 nvlist_free_null.9 \
 	nv.9 nvlist_free_number.9 \
 	nv.9 nvlist_free_nvlist.9 \
 	nv.9 nvlist_free_string.9 \
 	nv.9 nvlist_free_type.9 \
 	nv.9 nvlist_get_binary.9 \
 	nv.9 nvlist_get_bool.9 \
 	nv.9 nvlist_get_descriptor.9 \
 	nv.9 nvlist_get_number.9 \
 	nv.9 nvlist_get_nvlist.9 \
 	nv.9 nvlist_get_parent.9 \
 	nv.9 nvlist_get_string.9 \
 	nv.9 nvlist_move_binary.9 \
 	nv.9 nvlist_move_descriptor.9 \
 	nv.9 nvlist_move_nvlist.9 \
 	nv.9 nvlist_move_string.9 \
 	nv.9 nvlist_next.9 \
 	nv.9 nvlist_pack.9 \
 	nv.9 nvlist_recv.9 \
 	nv.9 nvlist_send.9 \
 	nv.9 nvlist_set_error.9 \
 	nv.9 nvlist_size.9 \
 	nv.9 nvlist_take_binary.9 \
 	nv.9 nvlist_take_bool.9 \
 	nv.9 nvlist_take_descriptor.9 \
 	nv.9 nvlist_take_number.9 \
 	nv.9 nvlist_take_nvlist.9 \
 	nv.9 nvlist_take_string.9 \
 	nv.9 nvlist_unpack.9 \
 	nv.9 nvlist_xfer.9
 MLINKS+=osd.9 osd_call.9 \
 	osd.9 osd_del.9 \
 	osd.9 osd_deregister.9 \
 	osd.9 osd_exit.9 \
 	osd.9 osd_get.9 \
 	osd.9 osd_register.9 \
 	osd.9 osd_set.9
 MLINKS+=panic.9 vpanic.9
 MLINKS+=pbuf.9 getpbuf.9 \
 	pbuf.9 relpbuf.9 \
 	pbuf.9 trypbuf.9
 MLINKS+=PCBGROUP.9 in_pcbgroup_byhash.9 \
 	PCBGROUP.9 in_pcbgroup_byinpcb.9 \
 	PCBGROUP.9 in_pcbgroup_destroy.9 \
 	PCBGROUP.9 in_pcbgroup_enabled.9 \
 	PCBGROUP.9 in_pcbgroup_init.9 \
 	PCBGROUP.9 in_pcbgroup_remove.9 \
 	PCBGROUP.9 in_pcbgroup_update.9 \
 	PCBGROUP.9 in_pcbgroup_update_mbuf.9 \
 	PCBGROUP.9 in6_pcbgroup_byhash.9
 MLINKS+=pci.9 pci_alloc_msi.9 \
 	pci.9 pci_alloc_msix.9 \
 	pci.9 pci_disable_busmaster.9 \
 	pci.9 pci_disable_io.9 \
 	pci.9 pci_enable_busmaster.9 \
 	pci.9 pci_enable_io.9 \
 	pci.9 pci_find_bsf.9 \
 	pci.9 pci_find_cap.9 \
 	pci.9 pci_find_dbsf.9 \
 	pci.9 pci_find_device.9 \
 	pci.9 pci_find_extcap.9 \
 	pci.9 pci_find_htcap.9 \
 	pci.9 pci_find_pcie_root_port.9 \
 	pci.9 pci_get_id.9 \
 	pci.9 pci_get_max_read_req.9 \
 	pci.9 pci_get_powerstate.9 \
 	pci.9 pci_get_vpd_ident.9 \
 	pci.9 pci_get_vpd_readonly.9 \
 	pci.9 pci_iov_attach.9 \
 	pci.9 pci_iov_attach_name.9 \
 	pci.9 pci_iov_detach.9 \
 	pci.9 pci_msi_count.9 \
 	pci.9 pci_msix_count.9 \
 	pci.9 pci_msix_pba_bar.9 \
 	pci.9 pci_msix_table_bar.9 \
 	pci.9 pci_pending_msix.9 \
 	pci.9 pci_read_config.9 \
 	pci.9 pci_release_msi.9 \
 	pci.9 pci_remap_msix.9 \
 	pci.9 pci_restore_state.9 \
 	pci.9 pci_save_state.9 \
 	pci.9 pci_set_powerstate.9 \
 	pci.9 pci_set_max_read_req.9 \
 	pci.9 pci_write_config.9 \
 	pci.9 pcie_adjust_config.9 \
 	pci.9 pcie_read_config.9 \
 	pci.9 pcie_write_config.9
 MLINKS+=pci_iov_schema.9 pci_iov_schema_alloc_node.9 \
 	pci_iov_schema.9 pci_iov_schema_add_bool.9 \
 	pci_iov_schema.9 pci_iov_schema_add_string.9 \
 	pci_iov_schema.9 pci_iov_schema_add_uint8.9 \
 	pci_iov_schema.9 pci_iov_schema_add_uint16.9 \
 	pci_iov_schema.9 pci_iov_schema_add_uint32.9 \
 	pci_iov_schema.9 pci_iov_schema_add_uint64.9 \
 	pci_iov_schema.9 pci_iov_schema_add_unicast_mac.9
 MLINKS+=pfil.9 pfil_add_hook.9 \
 	pfil.9 pfil_head_register.9 \
 	pfil.9 pfil_head_unregister.9 \
 	pfil.9 pfil_hook_get.9 \
 	pfil.9 pfil_remove_hook.9 \
 	pfil.9 pfil_rlock.9 \
 	pfil.9 pfil_run_hooks.9 \
 	pfil.9 pfil_runlock.9 \
 	pfil.9 pfil_wlock.9 \
 	pfil.9 pfil_wunlock.9
 MLINKS+=pfind.9 zpfind.9
 MLINKS+=PHOLD.9 PRELE.9 \
 	PHOLD.9 _PHOLD.9 \
 	PHOLD.9 _PRELE.9 \
 	PHOLD.9 PROC_ASSERT_HELD.9 \
 	PHOLD.9 PROC_ASSERT_NOT_HELD.9
 MLINKS+=pmap_copy.9 pmap_copy_page.9
 MLINKS+=pmap_extract.9 pmap_extract_and_hold.9
 MLINKS+=pmap_init.9 pmap_init2.9
 MLINKS+=pmap_is_modified.9 pmap_ts_referenced.9
 MLINKS+=pmap_pinit.9 pmap_pinit0.9 \
 	pmap_pinit.9 pmap_pinit2.9
 MLINKS+=pmap_qenter.9 pmap_qremove.9
 MLINKS+=pmap_quick_enter_page.9 pmap_quick_remove_page.9
 MLINKS+=pmap_remove.9 pmap_remove_all.9 \
 	pmap_remove.9 pmap_remove_pages.9
 MLINKS+=pmap_resident_count.9 pmap_wired_count.9
 MLINKS+=pmap_zero_page.9 pmap_zero_area.9 \
 	pmap_zero_page.9 pmap_zero_idle.9
 MLINKS+=printf.9 log.9 \
 	printf.9 tprintf.9 \
 	printf.9 uprintf.9
 MLINKS+=priv.9 priv_check.9 \
 	priv.9 priv_check_cred.9
 MLINKS+=proc_rwmem.9 proc_readmem.9 \
 	proc_rwmem.9 proc_writemem.9
 MLINKS+=psignal.9 gsignal.9 \
 	psignal.9 pgsignal.9 \
 	psignal.9 tdsignal.9
 MLINKS+=random.9 arc4rand.9 \
 	random.9 arc4random.9 \
 	random.9 read_random.9 \
 	random.9 read_random_uio.9 \
 	random.9 srandom.9
 MLINKS+=refcount.9 refcount_acquire.9 \
 	refcount.9 refcount_init.9 \
 	refcount.9 refcount_release.9
 MLINKS+=resource_int_value.9 resource_long_value.9 \
 	resource_int_value.9 resource_string_value.9
 MLINKS+=rman.9 rman_activate_resource.9 \
 	rman.9 rman_adjust_resource.9 \
 	rman.9 rman_deactivate_resource.9 \
 	rman.9 rman_fini.9 \
 	rman.9 rman_first_free_region.9 \
 	rman.9 rman_get_bushandle.9 \
 	rman.9 rman_get_bustag.9 \
 	rman.9 rman_get_device.9 \
 	rman.9 rman_get_end.9 \
 	rman.9 rman_get_flags.9 \
 	rman.9 rman_get_mapping.9 \
 	rman.9 rman_get_rid.9 \
 	rman.9 rman_get_size.9 \
 	rman.9 rman_get_start.9 \
 	rman.9 rman_get_virtual.9 \
 	rman.9 rman_init.9 \
 	rman.9 rman_init_from_resource.9 \
 	rman.9 rman_is_region_manager.9 \
 	rman.9 rman_last_free_region.9 \
 	rman.9 rman_make_alignment_flags.9 \
 	rman.9 rman_manage_region.9 \
 	rman.9 rman_release_resource.9 \
 	rman.9 rman_reserve_resource.9 \
 	rman.9 rman_reserve_resource_bound.9 \
 	rman.9 rman_set_bushandle.9 \
 	rman.9 rman_set_bustag.9 \
 	rman.9 rman_set_mapping.9 \
 	rman.9 rman_set_rid.9 \
 	rman.9 rman_set_virtual.9
 MLINKS+=rmlock.9 rm_assert.9 \
 	rmlock.9 rm_destroy.9 \
 	rmlock.9 rm_init.9 \
 	rmlock.9 rm_init_flags.9 \
 	rmlock.9 rm_rlock.9 \
 	rmlock.9 rm_runlock.9 \
 	rmlock.9 rm_sleep.9 \
 	rmlock.9 RM_SYSINIT.9 \
 	rmlock.9 rm_try_rlock.9 \
 	rmlock.9 rm_wlock.9 \
 	rmlock.9 rm_wowned.9 \
 	rmlock.9 rm_wunlock.9
 MLINKS+=rtalloc.9 rtalloc1.9 \
 	rtalloc.9 rtalloc_ign.9 \
 	rtalloc.9 RT_ADDREF.9 \
 	rtalloc.9 RT_LOCK.9 \
 	rtalloc.9 RT_REMREF.9 \
 	rtalloc.9 RT_RTFREE.9 \
 	rtalloc.9 RT_UNLOCK.9 \
 	rtalloc.9 RTFREE_LOCKED.9 \
 	rtalloc.9 RTFREE.9 \
 	rtalloc.9 rtfree.9 \
 	rtalloc.9 rtalloc1_fib.9 \
 	rtalloc.9 rtalloc_ign_fib.9 \
 	rtalloc.9 rtalloc_fib.9
 MLINKS+=runqueue.9 choosethread.9 \
 	runqueue.9 procrunnable.9 \
 	runqueue.9 remrunqueue.9 \
 	runqueue.9 setrunqueue.9
 MLINKS+=rwlock.9 rw_assert.9 \
 	rwlock.9 rw_destroy.9 \
 	rwlock.9 rw_downgrade.9 \
 	rwlock.9 rw_init.9 \
 	rwlock.9 rw_init_flags.9 \
 	rwlock.9 rw_initialized.9 \
 	rwlock.9 rw_rlock.9 \
 	rwlock.9 rw_runlock.9 \
 	rwlock.9 rw_unlock.9 \
 	rwlock.9 rw_sleep.9 \
 	rwlock.9 RW_SYSINIT.9 \
 	rwlock.9 rw_try_rlock.9 \
 	rwlock.9 rw_try_upgrade.9 \
 	rwlock.9 rw_try_wlock.9 \
 	rwlock.9 rw_wlock.9 \
 	rwlock.9 rw_wowned.9 \
 	rwlock.9 rw_wunlock.9
 MLINKS+=sbuf.9 sbuf_bcat.9 \
 	sbuf.9 sbuf_bcopyin.9 \
 	sbuf.9 sbuf_bcpy.9 \
 	sbuf.9 sbuf_cat.9 \
 	sbuf.9 sbuf_clear.9 \
 	sbuf.9 sbuf_copyin.9 \
 	sbuf.9 sbuf_cpy.9 \
 	sbuf.9 sbuf_data.9 \
 	sbuf.9 sbuf_delete.9 \
 	sbuf.9 sbuf_done.9 \
 	sbuf.9 sbuf_error.9 \
 	sbuf.9 sbuf_finish.9 \
 	sbuf.9 sbuf_len.9 \
 	sbuf.9 sbuf_new.9 \
 	sbuf.9 sbuf_new_auto.9 \
 	sbuf.9 sbuf_new_for_sysctl.9 \
 	sbuf.9 sbuf_printf.9 \
 	sbuf.9 sbuf_putc.9 \
 	sbuf.9 sbuf_set_drain.9 \
 	sbuf.9 sbuf_setpos.9 \
 	sbuf.9 sbuf_start_section.9 \
 	sbuf.9 sbuf_end_section.9  \
 	sbuf.9 sbuf_trim.9 \
 	sbuf.9 sbuf_vprintf.9
 MLINKS+=scheduler.9 curpriority_cmp.9 \
 	scheduler.9 maybe_resched.9 \
 	scheduler.9 propagate_priority.9 \
 	scheduler.9 resetpriority.9 \
 	scheduler.9 roundrobin.9 \
 	scheduler.9 roundrobin_interval.9 \
 	scheduler.9 schedclock.9 \
 	scheduler.9 schedcpu.9 \
 	scheduler.9 sched_setup.9 \
 	scheduler.9 setrunnable.9 \
 	scheduler.9 updatepri.9
 MLINKS+=SDT.9 SDT_PROVIDER_DECLARE.9 \
 	SDT.9 SDT_PROVIDER_DEFINE.9 \
 	SDT.9 SDT_PROBE_DECLARE.9 \
 	SDT.9 SDT_PROBE_DEFINE.9 \
 	SDT.9 SDT_PROBE.9
 MLINKS+=securelevel_gt.9 securelevel_ge.9
 MLINKS+=selrecord.9 seldrain.9 \
 	selrecord.9 selwakeup.9
 MLINKS+=sema.9 sema_destroy.9 \
 	sema.9 sema_init.9 \
 	sema.9 sema_post.9 \
 	sema.9 sema_timedwait.9 \
 	sema.9 sema_trywait.9 \
 	sema.9 sema_value.9 \
 	sema.9 sema_wait.9
 MLINKS+=sf_buf.9 sf_buf_alloc.9 \
 	sf_buf.9 sf_buf_free.9 \
 	sf_buf.9 sf_buf_kva.9 \
 	sf_buf.9 sf_buf_page.9
 MLINKS+=sglist.9 sglist_alloc.9 \
 	sglist.9 sglist_append.9 \
 	sglist.9 sglist_append_bio.9 \
 	sglist.9 sglist_append_mbuf.9 \
 	sglist.9 sglist_append_phys.9 \
 	sglist.9 sglist_append_uio.9 \
 	sglist.9 sglist_append_user.9 \
 	sglist.9 sglist_append_vmpages.9 \
 	sglist.9 sglist_build.9 \
 	sglist.9 sglist_clone.9 \
 	sglist.9 sglist_consume_uio.9 \
 	sglist.9 sglist_count.9 \
 	sglist.9 sglist_count_vmpages.9 \
 	sglist.9 sglist_free.9 \
 	sglist.9 sglist_hold.9 \
 	sglist.9 sglist_init.9 \
 	sglist.9 sglist_join.9 \
 	sglist.9 sglist_length.9 \
 	sglist.9 sglist_reset.9 \
 	sglist.9 sglist_slice.9 \
 	sglist.9 sglist_split.9
 MLINKS+=shm_map.9 shm_unmap.9
 MLINKS+=signal.9 cursig.9 \
 	signal.9 execsigs.9 \
 	signal.9 issignal.9 \
 	signal.9 killproc.9 \
 	signal.9 pgsigio.9 \
 	signal.9 postsig.9 \
 	signal.9 SETSETNEQ.9 \
 	signal.9 SETSETOR.9 \
 	signal.9 SIGADDSET.9 \
 	signal.9 SIG_CONTSIGMASK.9 \
 	signal.9 SIGDELSET.9 \
 	signal.9 SIGEMPTYSET.9 \
 	signal.9 sigexit.9 \
 	signal.9 SIGFILLSET.9 \
 	signal.9 siginit.9 \
 	signal.9 SIGISEMPTY.9 \
 	signal.9 SIGISMEMBER.9 \
 	signal.9 SIGNOTEMPTY.9 \
 	signal.9 signotify.9 \
 	signal.9 SIGPENDING.9 \
 	signal.9 SIGSETAND.9 \
 	signal.9 SIGSETCANTMASK.9 \
 	signal.9 SIGSETEQ.9 \
 	signal.9 SIGSETNAND.9 \
 	signal.9 SIG_STOPSIGMASK.9 \
 	signal.9 trapsignal.9
 MLINKS+=sleep.9 msleep.9 \
 	sleep.9 msleep_sbt.9 \
 	sleep.9 msleep_spin.9 \
 	sleep.9 msleep_spin_sbt.9 \
 	sleep.9 pause.9 \
 	sleep.9 pause_sbt.9 \
 	sleep.9 tsleep.9 \
 	sleep.9 tsleep_sbt.9 \
 	sleep.9 wakeup.9 \
 	sleep.9 wakeup_one.9
 MLINKS+=sleepqueue.9 init_sleepqueues.9 \
 	sleepqueue.9 sleepq_abort.9 \
 	sleepqueue.9 sleepq_add.9 \
 	sleepqueue.9 sleepq_alloc.9 \
 	sleepqueue.9 sleepq_broadcast.9 \
 	sleepqueue.9 sleepq_free.9 \
 	sleepqueue.9 sleepq_lookup.9 \
 	sleepqueue.9 sleepq_lock.9 \
 	sleepqueue.9 sleepq_release.9 \
 	sleepqueue.9 sleepq_remove.9 \
 	sleepqueue.9 sleepq_set_timeout.9 \
 	sleepqueue.9 sleepq_set_timeout_sbt.9 \
 	sleepqueue.9 sleepq_signal.9 \
 	sleepqueue.9 sleepq_sleepcnt.9 \
 	sleepqueue.9 sleepq_timedwait.9 \
 	sleepqueue.9 sleepq_timedwait_sig.9 \
 	sleepqueue.9 sleepq_type.9 \
 	sleepqueue.9 sleepq_wait.9 \
 	sleepqueue.9 sleepq_wait_sig.9
 MLINKS+=socket.9 soabort.9 \
 	socket.9 soaccept.9 \
 	socket.9 sobind.9 \
 	socket.9 socheckuid.9 \
 	socket.9 soclose.9 \
 	socket.9 soconnect.9 \
 	socket.9 socreate.9 \
 	socket.9 sodisconnect.9 \
 	socket.9 sodupsockaddr.9 \
 	socket.9 sofree.9 \
 	socket.9 sogetopt.9 \
 	socket.9 sohasoutofband.9 \
 	socket.9 solisten.9 \
 	socket.9 solisten_proto.9 \
 	socket.9 solisten_proto_check.9 \
 	socket.9 sonewconn.9 \
 	socket.9 sooptcopyin.9 \
 	socket.9 sooptcopyout.9 \
 	socket.9 sopoll.9 \
 	socket.9 sopoll_generic.9 \
 	socket.9 soreceive.9 \
 	socket.9 soreceive_dgram.9 \
 	socket.9 soreceive_generic.9 \
 	socket.9 soreceive_stream.9 \
 	socket.9 soreserve.9 \
 	socket.9 sorflush.9 \
 	socket.9 sosend.9 \
 	socket.9 sosend_dgram.9 \
 	socket.9 sosend_generic.9 \
 	socket.9 sosetopt.9 \
 	socket.9 soshutdown.9 \
 	socket.9 sotoxsocket.9 \
 	socket.9 soupcall_clear.9 \
 	socket.9 soupcall_set.9 \
 	socket.9 sowakeup.9
 MLINKS+=stack.9 stack_copy.9 \
 	stack.9 stack_create.9 \
 	stack.9 stack_destroy.9 \
 	stack.9 stack_print.9 \
 	stack.9 stack_print_ddb.9 \
 	stack.9 stack_print_short.9 \
 	stack.9 stack_print_short_ddb.9 \
 	stack.9 stack_put.9 \
 	stack.9 stack_save.9 \
 	stack.9 stack_sbuf_print.9 \
 	stack.9 stack_sbuf_print_ddb.9 \
 	stack.9 stack_zero.9
 MLINKS+=store.9 subyte.9 \
 	store.9 suswintr.9 \
 	store.9 suword.9 \
 	store.9 suword16.9 \
 	store.9 suword32.9 \
 	store.9 suword64.9
 MLINKS+=swi.9 swi_add.9 \
 	swi.9 swi_remove.9 \
 	swi.9 swi_sched.9
 MLINKS+=sx.9 sx_assert.9 \
 	sx.9 sx_destroy.9 \
 	sx.9 sx_downgrade.9 \
 	sx.9 sx_init.9 \
 	sx.9 sx_init_flags.9 \
 	sx.9 sx_sleep.9 \
 	sx.9 sx_slock.9 \
 	sx.9 sx_slock_sig.9 \
 	sx.9 sx_sunlock.9 \
 	sx.9 SX_SYSINIT.9 \
 	sx.9 sx_try_slock.9 \
 	sx.9 sx_try_upgrade.9 \
 	sx.9 sx_try_xlock.9 \
 	sx.9 sx_unlock.9 \
 	sx.9 sx_xholder.9 \
 	sx.9 sx_xlock.9 \
 	sx.9 sx_xlock_sig.9 \
 	sx.9 sx_xlocked.9 \
 	sx.9 sx_xunlock.9
 MLINKS+=sysctl.9 SYSCTL_DECL.9 \
 	sysctl.9 SYSCTL_ADD_INT.9 \
 	sysctl.9 SYSCTL_ADD_LONG.9 \
 	sysctl.9 SYSCTL_ADD_NODE.9 \
 	sysctl.9 SYSCTL_ADD_OPAQUE.9 \
 	sysctl.9 SYSCTL_ADD_PROC.9 \
 	sysctl.9 SYSCTL_ADD_QUAD.9 \
 	sysctl.9 SYSCTL_ADD_ROOT_NODE.9 \
 	sysctl.9 SYSCTL_ADD_S8.9 \
 	sysctl.9 SYSCTL_ADD_S16.9 \
 	sysctl.9 SYSCTL_ADD_S32.9 \
 	sysctl.9 SYSCTL_ADD_S64.9 \
 	sysctl.9 SYSCTL_ADD_STRING.9 \
 	sysctl.9 SYSCTL_ADD_STRUCT.9 \
 	sysctl.9 SYSCTL_ADD_U8.9 \
 	sysctl.9 SYSCTL_ADD_U16.9 \
 	sysctl.9 SYSCTL_ADD_U32.9 \
 	sysctl.9 SYSCTL_ADD_U64.9 \
 	sysctl.9 SYSCTL_ADD_UAUTO.9 \
 	sysctl.9 SYSCTL_ADD_UINT.9 \
 	sysctl.9 SYSCTL_ADD_ULONG.9 \
 	sysctl.9 SYSCTL_ADD_UQUAD.9 \
 	sysctl.9 SYSCTL_CHILDREN.9 \
 	sysctl.9 SYSCTL_STATIC_CHILDREN.9 \
 	sysctl.9 SYSCTL_NODE_CHILDREN.9 \
 	sysctl.9 SYSCTL_PARENT.9 \
 	sysctl.9 SYSCTL_INT.9 \
 	sysctl.9 SYSCTL_LONG.9 \
 	sysctl.9 SYSCTL_NODE.9 \
 	sysctl.9 SYSCTL_OPAQUE.9 \
 	sysctl.9 SYSCTL_PROC.9 \
 	sysctl.9 SYSCTL_QUAD.9 \
 	sysctl.9 SYSCTL_ROOT_NODE.9 \
 	sysctl.9 SYSCTL_S8.9 \
 	sysctl.9 SYSCTL_S16.9 \
 	sysctl.9 SYSCTL_S32.9 \
 	sysctl.9 SYSCTL_S64.9 \
 	sysctl.9 SYSCTL_STRING.9 \
 	sysctl.9 SYSCTL_STRUCT.9 \
 	sysctl.9 SYSCTL_U8.9 \
 	sysctl.9 SYSCTL_U16.9 \
 	sysctl.9 SYSCTL_U32.9 \
 	sysctl.9 SYSCTL_U64.9 \
 	sysctl.9 SYSCTL_UINT.9 \
 	sysctl.9 SYSCTL_ULONG.9 \
 	sysctl.9 SYSCTL_UQUAD.9
 MLINKS+=sysctl_add_oid.9 sysctl_move_oid.9 \
 	sysctl_add_oid.9 sysctl_remove_oid.9 \
 	sysctl_add_oid.9 sysctl_remove_name.9
 MLINKS+=sysctl_ctx_init.9 sysctl_ctx_entry_add.9 \
 	sysctl_ctx_init.9 sysctl_ctx_entry_del.9 \
 	sysctl_ctx_init.9 sysctl_ctx_entry_find.9 \
 	sysctl_ctx_init.9 sysctl_ctx_free.9
 MLINKS+=SYSINIT.9 SYSUNINIT.9
 MLINKS+=taskqueue.9 TASK_INIT.9 \
 	taskqueue.9 TASK_INITIALIZER.9 \
 	taskqueue.9 taskqueue_block.9 \
 	taskqueue.9 taskqueue_cancel.9 \
 	taskqueue.9 taskqueue_cancel_timeout.9 \
 	taskqueue.9 taskqueue_create.9 \
 	taskqueue.9 taskqueue_create_fast.9 \
 	taskqueue.9 TASKQUEUE_DECLARE.9 \
 	taskqueue.9 TASKQUEUE_DEFINE.9 \
 	taskqueue.9 TASKQUEUE_DEFINE_THREAD.9 \
 	taskqueue.9 taskqueue_drain.9 \
 	taskqueue.9 taskqueue_drain_all.9 \
 	taskqueue.9 taskqueue_drain_timeout.9 \
 	taskqueue.9 taskqueue_enqueue.9 \
 	taskqueue.9 taskqueue_enqueue_timeout.9 \
 	taskqueue.9 TASKQUEUE_FAST_DEFINE.9 \
 	taskqueue.9 TASKQUEUE_FAST_DEFINE_THREAD.9 \
 	taskqueue.9 taskqueue_free.9 \
 	taskqueue.9 taskqueue_member.9 \
 	taskqueue.9 taskqueue_run.9 \
 	taskqueue.9 taskqueue_set_callback.9 \
 	taskqueue.9 taskqueue_start_threads.9 \
 	taskqueue.9 taskqueue_start_threads_pinned.9 \
 	taskqueue.9 taskqueue_unblock.9 \
 	taskqueue.9 TIMEOUT_TASK_INIT.9
 MLINKS+=tcp_functions.9 register_tcp_functions.9 \
 	tcp_functions.9 deregister_tcp_functions.9
 MLINKS+=time.9 boottime.9 \
 	time.9 time_second.9 \
 	time.9 time_uptime.9
 MLINKS+=timeout.9 callout.9 \
 	timeout.9 callout_active.9 \
 	timeout.9 callout_async_drain.9 \
 	timeout.9 callout_deactivate.9 \
 	timeout.9 callout_drain.9 \
 	timeout.9 callout_handle_init.9 \
 	timeout.9 callout_init.9 \
 	timeout.9 callout_init_mtx.9 \
 	timeout.9 callout_init_rm.9 \
 	timeout.9 callout_init_rw.9 \
 	timeout.9 callout_pending.9 \
 	timeout.9 callout_reset.9 \
 	timeout.9 callout_reset_curcpu.9 \
 	timeout.9 callout_reset_on.9 \
 	timeout.9 callout_reset_sbt.9 \
 	timeout.9 callout_reset_sbt_curcpu.9 \
 	timeout.9 callout_reset_sbt_on.9 \
 	timeout.9 callout_schedule.9 \
 	timeout.9 callout_schedule_curcpu.9 \
 	timeout.9 callout_schedule_on.9 \
 	timeout.9 callout_schedule_sbt.9 \
 	timeout.9 callout_schedule_sbt_curcpu.9 \
 	timeout.9 callout_schedule_sbt_on.9 \
 	timeout.9 callout_stop.9 \
 	timeout.9 callout_when.9 \
 	timeout.9 untimeout.9
 MLINKS+=ucred.9 cred_update_thread.9 \
 	ucred.9 crcopy.9 \
 	ucred.9 crcopysafe.9 \
 	ucred.9 crdup.9 \
 	ucred.9 crfree.9 \
 	ucred.9 crget.9 \
 	ucred.9 crhold.9 \
 	ucred.9 crsetgroups.9 \
 	ucred.9 crshared.9 \
 	ucred.9 cru2x.9
 MLINKS+=uidinfo.9 uifind.9 \
 	uidinfo.9 uifree.9 \
 	uidinfo.9 uihashinit.9 \
 	uidinfo.9 uihold.9
 MLINKS+=uio.9 uiomove.9 \
 	uio.9 uiomove_nofault.9
 
 .if ${MK_USB} != "no"
 MAN+=	usbdi.9
 MLINKS+=usbdi.9 usbd_do_request.9 \
 	usbdi.9 usbd_do_request_flags.9 \
 	usbdi.9 usbd_errstr.9 \
 	usbdi.9 usbd_lookup_id_by_info.9 \
 	usbdi.9 usbd_lookup_id_by_uaa.9 \
 	usbdi.9 usbd_transfer_clear_stall.9 \
 	usbdi.9 usbd_transfer_drain.9 \
 	usbdi.9 usbd_transfer_pending.9 \
 	usbdi.9 usbd_transfer_poll.9 \
 	usbdi.9 usbd_transfer_setup.9 \
 	usbdi.9 usbd_transfer_start.9 \
 	usbdi.9 usbd_transfer_stop.9 \
 	usbdi.9 usbd_transfer_submit.9 \
 	usbdi.9 usbd_transfer_unsetup.9 \
 	usbdi.9 usbd_xfer_clr_flag.9 \
 	usbdi.9 usbd_xfer_frame_data.9 \
 	usbdi.9 usbd_xfer_frame_len.9 \
 	usbdi.9 usbd_xfer_get_frame.9 \
 	usbdi.9 usbd_xfer_get_priv.9 \
 	usbdi.9 usbd_xfer_is_stalled.9 \
 	usbdi.9 usbd_xfer_max_framelen.9 \
 	usbdi.9 usbd_xfer_max_frames.9 \
 	usbdi.9 usbd_xfer_max_len.9 \
 	usbdi.9 usbd_xfer_set_flag.9 \
 	usbdi.9 usbd_xfer_set_frame_data.9 \
 	usbdi.9 usbd_xfer_set_frame_len.9 \
 	usbdi.9 usbd_xfer_set_frame_offset.9 \
 	usbdi.9 usbd_xfer_set_frames.9 \
 	usbdi.9 usbd_xfer_set_interval.9 \
 	usbdi.9 usbd_xfer_set_priv.9 \
 	usbdi.9 usbd_xfer_set_stall.9 \
 	usbdi.9 usbd_xfer_set_timeout.9 \
 	usbdi.9 usbd_xfer_softc.9 \
 	usbdi.9 usbd_xfer_state.9 \
 	usbdi.9 usbd_xfer_status.9 \
 	usbdi.9 usb_fifo_alloc_buffer.9 \
 	usbdi.9 usb_fifo_attach.9 \
 	usbdi.9 usb_fifo_detach.9 \
 	usbdi.9 usb_fifo_free_buffer.9 \
 	usbdi.9 usb_fifo_get_data.9 \
 	usbdi.9 usb_fifo_get_data_buffer.9 \
 	usbdi.9 usb_fifo_get_data_error.9 \
 	usbdi.9 usb_fifo_get_data_linear.9 \
 	usbdi.9 usb_fifo_put_bytes_max.9 \
 	usbdi.9 usb_fifo_put_data.9 \
 	usbdi.9 usb_fifo_put_data_buffer.9 \
 	usbdi.9 usb_fifo_put_data_error.9 \
 	usbdi.9 usb_fifo_put_data_linear.9 \
 	usbdi.9 usb_fifo_reset.9 \
 	usbdi.9 usb_fifo_softc.9 \
 	usbdi.9 usb_fifo_wakeup.9
 .endif
 MLINKS+=vcount.9 count_dev.9
 MLINKS+=vfsconf.9 vfs_modevent.9 \
 	vfsconf.9 vfs_register.9 \
 	vfsconf.9 vfs_unregister.9
 MLINKS+=vfs_getopt.9 vfs_copyopt.9 \
 	vfs_getopt.9 vfs_filteropt.9 \
 	vfs_getopt.9 vfs_flagopt.9 \
 	vfs_getopt.9 vfs_getopts.9 \
 	vfs_getopt.9 vfs_scanopt.9 \
 	vfs_getopt.9 vfs_setopt.9 \
 	vfs_getopt.9 vfs_setopt_part.9 \
 	vfs_getopt.9 vfs_setopts.9
 MLINKS+=vhold.9 vdrop.9 \
 	vhold.9 vdropl.9 \
 	vhold.9 vholdl.9
 MLINKS+=vmem.9 vmem_add.9 \
 	vmem.9 vmem_alloc.9 \
 	vmem.9 vmem_create.9 \
 	vmem.9 vmem_destroy.9 \
 	vmem.9 vmem_free.9 \
 	vmem.9 vmem_xalloc.9 \
 	vmem.9 vmem_xfree.9  
 MLINKS+=vm_map_lock.9 vm_map_lock_downgrade.9 \
 	vm_map_lock.9 vm_map_lock_read.9 \
 	vm_map_lock.9 vm_map_lock_upgrade.9 \
 	vm_map_lock.9 vm_map_trylock.9 \
 	vm_map_lock.9 vm_map_trylock_read.9 \
 	vm_map_lock.9 vm_map_unlock.9 \
 	vm_map_lock.9 vm_map_unlock_read.9
 MLINKS+=vm_map_lookup.9 vm_map_lookup_done.9
 MLINKS+=vm_map_max.9 vm_map_min.9 \
 	vm_map_max.9 vm_map_pmap.9
 MLINKS+=vm_map_stack.9 vm_map_growstack.9
 MLINKS+=vm_map_wire.9 vm_map_unwire.9
 MLINKS+=vm_page_bits.9 vm_page_clear_dirty.9 \
 	vm_page_bits.9 vm_page_dirty.9 \
 	vm_page_bits.9 vm_page_is_valid.9 \
 	vm_page_bits.9 vm_page_set_invalid.9 \
 	vm_page_bits.9 vm_page_set_validclean.9 \
 	vm_page_bits.9 vm_page_test_dirty.9 \
 	vm_page_bits.9 vm_page_undirty.9 \
 	vm_page_bits.9 vm_page_zero_invalid.9
 MLINKS+=vm_page_busy.9 vm_page_busied.9 \
 	vm_page_busy.9 vm_page_busy_downgrade.9 \
 	vm_page_busy.9 vm_page_busy_sleep.9 \
 	vm_page_busy.9 vm_page_sbusied.9 \
 	vm_page_busy.9 vm_page_sbusy.9 \
 	vm_page_busy.9 vm_page_sleep_if_busy.9 \
 	vm_page_busy.9 vm_page_sunbusy.9 \
 	vm_page_busy.9 vm_page_trysbusy.9 \
 	vm_page_busy.9 vm_page_tryxbusy.9 \
 	vm_page_busy.9 vm_page_xbusied.9 \
 	vm_page_busy.9 vm_page_xbusy.9 \
 	vm_page_busy.9 vm_page_xunbusy.9 \
 	vm_page_busy.9 vm_page_assert_sbusied.9 \
 	vm_page_busy.9 vm_page_assert_unbusied.9 \
 	vm_page_busy.9 vm_page_assert_xbusied.9
 MLINKS+=vm_page_aflag.9 vm_page_aflag_clear.9 \
 	vm_page_aflag.9 vm_page_aflag_set.9 \
 	vm_page_aflag.9 vm_page_reference.9
 MLINKS+=vm_page_free.9 vm_page_free_toq.9 \
 	vm_page_free.9 vm_page_free_zero.9 \
 	vm_page_free.9 vm_page_try_to_free.9
 MLINKS+=vm_page_hold.9 vm_page_unhold.9
 MLINKS+=vm_page_insert.9 vm_page_remove.9
 MLINKS+=vm_page_wire.9 vm_page_unwire.9
 MLINKS+=VOP_ACCESS.9 VOP_ACCESSX.9
 MLINKS+=VOP_ATTRIB.9 VOP_GETATTR.9 \
 	VOP_ATTRIB.9 VOP_SETATTR.9
 MLINKS+=VOP_CREATE.9 VOP_MKDIR.9 \
 	VOP_CREATE.9 VOP_MKNOD.9 \
 	VOP_CREATE.9 VOP_SYMLINK.9
 MLINKS+=VOP_GETPAGES.9 VOP_PUTPAGES.9
 MLINKS+=VOP_INACTIVE.9 VOP_RECLAIM.9
 MLINKS+=VOP_LOCK.9 vn_lock.9 \
 	VOP_LOCK.9 VOP_ISLOCKED.9 \
 	VOP_LOCK.9 VOP_UNLOCK.9
 MLINKS+=VOP_OPENCLOSE.9 VOP_CLOSE.9 \
 	VOP_OPENCLOSE.9 VOP_OPEN.9
 MLINKS+=VOP_RDWR.9 VOP_READ.9 \
 	VOP_RDWR.9 VOP_WRITE.9
 MLINKS+=VOP_REMOVE.9 VOP_RMDIR.9
 MLINKS+=vnet.9 vimage.9
 MLINKS+=vref.9 VREF.9
 MLINKS+=vrele.9 vput.9 \
 	vrele.9 vunref.9
 MLINKS+=vslock.9 vsunlock.9
 MLINKS+=zone.9 uma.9 \
 	zone.9 uma_find_refcnt.9 \
 	zone.9 uma_zalloc.9 \
 	zone.9 uma_zalloc_arg.9 \
 	zone.9 uma_zcreate.9 \
 	zone.9 uma_zdestroy.9 \
 	zone.9 uma_zfree.9 \
 	zone.9 uma_zfree_arg.9 \
 	zone.9 uma_zone_get_cur.9 \
 	zone.9 uma_zone_get_max.9 \
 	zone.9 uma_zone_set_max.9 \
 	zone.9 uma_zone_set_warning.9 \
 	zone.9 uma_zone_set_maxaction.9
 
 .include <bsd.prog.mk>
Index: user/alc/PQ_LAUNDRY/share/man/man9/lock.9
===================================================================
--- user/alc/PQ_LAUNDRY/share/man/man9/lock.9	(revision 303774)
+++ user/alc/PQ_LAUNDRY/share/man/man9/lock.9	(revision 303775)
@@ -1,424 +1,417 @@
 .\"
 .\" Copyright (C) 2002 Chad David <davidc@acns.ab.ca>. All rights reserved.
 .\"
 .\" Redistribution and use in source and binary forms, with or without
 .\" modification, are permitted provided that the following conditions
 .\" are met:
 .\" 1. Redistributions of source code must retain the above copyright
 .\"    notice(s), this list of conditions and the following disclaimer as
 .\"    the first lines of this file unmodified other than the possible
 .\"    addition of one or more copyright notices.
 .\" 2. Redistributions in binary form must reproduce the above copyright
 .\"    notice(s), this list of conditions and the following disclaimer in the
 .\"    documentation and/or other materials provided with the distribution.
 .\"
 .\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY
 .\" EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 .\" WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 .\" DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY
 .\" DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 .\" (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
 .\" SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
 .\" CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
 .\" DAMAGE.
 .\"
 .\" $FreeBSD$
 .\"
 .Dd November 2, 2014
 .Dt LOCK 9
 .Os
 .Sh NAME
 .Nm lockinit ,
 .Nm lockdestroy ,
 .Nm lockmgr ,
 .Nm lockmgr_args ,
 .Nm lockmgr_args_rw ,
 .Nm lockmgr_disown ,
 .Nm lockmgr_printinfo ,
 .Nm lockmgr_recursed ,
 .Nm lockmgr_rw ,
-.Nm lockmgr_waiters ,
 .Nm lockstatus ,
 .Nm lockmgr_assert
 .Nd "lockmgr family of functions"
 .Sh SYNOPSIS
 .In sys/types.h
 .In sys/lock.h
 .In sys/lockmgr.h
 .Ft void
 .Fn lockinit "struct lock *lkp" "int prio" "const char *wmesg" "int timo" "int flags"
 .Ft void
 .Fn lockdestroy "struct lock *lkp"
 .Ft int
 .Fn lockmgr "struct lock *lkp" "u_int flags" "struct mtx *ilk"
 .Ft int
 .Fn lockmgr_args "struct lock *lkp" "u_int flags" "struct mtx *ilk" "const char *wmesg" "int prio" "int timo"
 .Ft int
 .Fn lockmgr_args_rw "struct lock *lkp" "u_int flags" "struct rwlock *ilk" "const char *wmesg" "int prio" "int timo"
 .Ft void
 .Fn lockmgr_disown "struct lock *lkp"
 .Ft void
 .Fn lockmgr_printinfo "const struct lock *lkp"
 .Ft int
 .Fn lockmgr_recursed "const struct lock *lkp"
 .Ft int
 .Fn lockmgr_rw "struct lock *lkp" "u_int flags" "struct rwlock *ilk"
 .Ft int
-.Fn lockmgr_waiters "const struct lock *lkp"
-.Ft int
 .Fn lockstatus "const struct lock *lkp"
 .Pp
 .Cd "options INVARIANTS"
 .Cd "options INVARIANT_SUPPORT"
 .Ft void
 .Fn lockmgr_assert "const struct lock *lkp" "int what"
 .Sh DESCRIPTION
 The
 .Fn lockinit
 function is used to initialize a lock.
 It must be called before any operation can be performed on a lock.
 Its arguments are:
 .Bl -tag -width ".Fa wmesg"
 .It Fa lkp
 A pointer to the lock to initialize.
 .It Fa prio
 The priority passed to
 .Xr sleep 9 .
 .It Fa wmesg
 The lock message.
 This is used for both debugging output and
 .Xr sleep 9 .
 .It Fa timo
 The timeout value passed to
 .Xr sleep 9 .
 .It Fa flags
 The flags the lock is to be initialized with:
 .Bl -tag -width ".Dv LK_CANRECURSE"
 .It Dv LK_ADAPTIVE
 Enable adaptive spinning for this lock if the kernel is compiled with the
 ADAPTIVE_LOCKMGRS option.
 .It Dv LK_CANRECURSE
 Allow recursive exclusive locks.
 .It Dv LK_NOPROFILE
 Disable lock profiling for this lock.
 .It Dv LK_NOSHARE
 Allow exclusive locks only.
 .It Dv LK_NOWITNESS
 Instruct
 .Xr witness 4
 to ignore this lock.
 .It Dv LK_NODUP
 .Xr witness 4
 should log messages about duplicate locks being acquired.
 .It Dv LK_QUIET
 Disable
 .Xr ktr 4
 logging for this lock.
 .It Dv LK_TIMELOCK
 Use
 .Fa timo
 during a sleep; otherwise, 0 is used.
 .El
 .El
 .Pp
 The
 .Fn lockdestroy
 function is used to destroy a lock, and while it is called in a number of
 places in the kernel, it currently does nothing.
 .Pp
 The
 .Fn lockmgr
 and
 .Fn lockmgr_rw
 functions handle general locking functionality within the kernel, including
 support for shared and exclusive locks, and recursion.
 .Fn lockmgr
 and
 .Fn lockmgr_rw
 are also able to upgrade and downgrade locks.
 .Pp
 Their arguments are:
 .Bl -tag -width ".Fa flags"
 .It Fa lkp
 A pointer to the lock to manipulate.
 .It Fa flags
 Flags indicating what action is to be taken.
 .Bl -tag -width ".Dv LK_NODDLKTREAT"
 .It Dv LK_SHARED
 Acquire a shared lock.
 If an exclusive lock is currently held,
 .Dv EDEADLK
 will be returned.
 .It Dv LK_EXCLUSIVE
 Acquire an exclusive lock.
 If an exclusive lock is already held, and
 .Dv LK_CANRECURSE
 is not set, the system will
 .Xr panic 9 .
 .It Dv LK_DOWNGRADE
 Downgrade exclusive lock to a shared lock.
 Downgrading a shared lock is not permitted.
 If an exclusive lock has been recursed, the system will
 .Xr panic 9 .
 .It Dv LK_UPGRADE
 Upgrade a shared lock to an exclusive lock.
 If this call fails, the shared lock is lost, even if the
 .Dv LK_NOWAIT
 flag is specified.
 During the upgrade, the shared lock could
 be temporarily dropped.
 Attempts to upgrade an exclusive lock will cause a
 .Xr panic 9 .
 .It Dv LK_TRYUPGRADE
 Try to upgrade a shared lock to an exclusive lock.
 The failure to upgrade does not result in the dropping
 of the shared lock ownership.
 .It Dv LK_RELEASE
 Release the lock.
 Releasing a lock that is not held can cause a
 .Xr panic 9 .
 .It Dv LK_DRAIN
 Wait for all activity on the lock to end, then mark it decommissioned.
 This is used before freeing a lock that is part of a piece of memory that is
 about to be freed.
 (As documented in
 .In sys/lockmgr.h . )
 .It Dv LK_SLEEPFAIL
 Fail if operation has slept.
 .It Dv LK_NOWAIT
 Do not allow the call to sleep.
 This can be used to test the lock.
 .It Dv LK_NOWITNESS
 Skip the
 .Xr witness 4
 checks for this instance.
 .It Dv LK_CANRECURSE
 Allow recursion on an exclusive lock.
 For every lock there must be a release.
 .It Dv LK_INTERLOCK
 Unlock the interlock (which should be locked already).
 .It Dv LK_NODDLKTREAT
 Normally,
 .Fn lockmgr
 postpones serving further shared requests for shared-locked lock if there is
 exclusive waiter, to avoid exclusive lock starvation.
 But, if the thread requesting the shared lock already owns a shared lockmgr
 lock, the request is granted even in presence of the parallel exclusive lock
 request, which is done to avoid deadlocks with recursive shared acquisition.
 .Pp
 The
 .Dv LK_NODDLKTREAT
 flag can only be used by code which requests shared non-recursive lock.
 The flag allows exclusive requests to preempt the current shared request
 even if the current thread owns shared locks.
 This is safe since shared lock is guaranteed to not recurse, and is used
 when thread is known to held unrelated shared locks, to not cause
 unnecessary starvation.
 An example is
 .Dv vp
 locking in VFS
 .Xr lookup 9 ,
 when
 .Dv dvp
 is already locked.
 .El
 .It Fa ilk
 An interlock mutex for controlling group access to the lock.
 If
 .Dv LK_INTERLOCK
 is specified,
 .Fn lockmgr
 and
 .Fn lockmgr_rw
 assume
 .Fa ilk
 is currently owned and not recursed, and will return it unlocked.
 See
 .Xr mtx_assert 9 .
 .El
 .Pp
 The
 .Fn lockmgr_args
 and
 .Fn lockmgr_args_rw
 function work like
 .Fn lockmgr
 and
 .Fn lockmgr_rw
 but accepting a
 .Fa wmesg ,
 .Fa timo
 and
 .Fa prio
 on a per-instance basis.
 The specified values will override the default
 ones, but this can still be used passing, respectively,
 .Dv LK_WMESG_DEFAULT ,
 .Dv LK_PRIO_DEFAULT
 and
 .Dv LK_TIMO_DEFAULT .
 .Pp
 The
 .Fn lockmgr_disown
 function switches the owner from the current thread to be
 .Dv LK_KERNPROC ,
 if the lock is already held.
 .Pp
 The
 .Fn lockmgr_printinfo
 function prints debugging information about the lock.
 It is used primarily by
 .Xr VOP_PRINT 9
 functions.
 .Pp
 The
 .Fn lockmgr_recursed
 function returns true if the lock is recursed, 0
 otherwise.
-.Pp
-The
-.Fn lockmgr_waiters
-function returns true if the lock has waiters, 0 otherwise.
 .Pp
 The
 .Fn lockstatus
 function returns the status of the lock in relation to the current thread.
 .Pp
 When compiled with
 .Cd "options INVARIANTS"
 and
 .Cd "options INVARIANT_SUPPORT" ,
 the
 .Fn lockmgr_assert
 function tests
 .Fa lkp
 for the assertions specified in
 .Fa what ,
 and panics if they are not met.
 One of the following assertions must be specified:
 .Bl -tag -width ".Dv KA_UNLOCKED"
 .It Dv KA_LOCKED
 Assert that the current thread has either a shared or an exclusive lock on the
 .Vt lkp
 lock pointed to by the first argument.
 .It Dv KA_SLOCKED
 Assert that the current thread has a shared lock on the
 .Vt lkp
 lock pointed to by the first argument.
 .It Dv KA_XLOCKED
 Assert that the current thread has an exclusive lock on the
 .Vt lkp
 lock pointed to by the first argument.
 .It Dv KA_UNLOCKED
 Assert that the current thread has no lock on the
 .Vt lkp
 lock pointed to by the first argument.
 .El
 .Pp
 In addition, one of the following optional assertions can be used with
 either an
 .Dv KA_LOCKED ,
 .Dv KA_SLOCKED ,
 or
 .Dv KA_XLOCKED
 assertion:
 .Bl -tag -width ".Dv KA_NOTRECURSED"
 .It Dv KA_RECURSED
 Assert that the current thread has a recursed lock on
 .Fa lkp .
 .It Dv KA_NOTRECURSED
 Assert that the current thread does not have a recursed lock on
 .Fa lkp .
 .El
 .Sh RETURN VALUES
 The
 .Fn lockmgr
 and
 .Fn lockmgr_rw
 functions return 0 on success and non-zero on failure.
 .Pp
 The
 .Fn lockstatus
 function returns:
 .Bl -tag -width ".Dv LK_EXCLUSIVE"
 .It Dv LK_EXCLUSIVE
 An exclusive lock is held by the current thread.
 .It Dv LK_EXCLOTHER
 An exclusive lock is held by someone other than the current thread.
 .It Dv LK_SHARED
 A shared lock is held.
 .It Li 0
 The lock is not held by anyone.
 .El
 .Sh ERRORS
 .Fn lockmgr
 and
 .Fn lockmgr_rw
 fail if:
 .Bl -tag -width Er
 .It Bq Er EBUSY
 .Dv LK_FORCEUPGRADE
 was requested and another thread had already requested a lock upgrade.
 .It Bq Er EBUSY
 .Dv LK_NOWAIT
 was set, and a sleep would have been required, or
 .Dv LK_TRYUPGRADE
 operation was not able to upgrade the lock.
 .It Bq Er ENOLCK
 .Dv LK_SLEEPFAIL
 was set and
 .Fn lockmgr
 or
 .Fn lockmgr_rw
 did sleep.
 .It Bq Er EINTR
 .Dv PCATCH
 was set in the lock priority, and a signal was delivered during a sleep.
 Note the
 .Er ERESTART
 error below.
 .It Bq Er ERESTART
 .Dv PCATCH
 was set in the lock priority, a signal was delivered during a sleep,
 and the system call is to be restarted.
 .It Bq Er EWOULDBLOCK
 a non-zero timeout was given, and the timeout expired.
 .El
 .Sh LOCKS
 If
 .Dv LK_INTERLOCK
 is passed in the
 .Fa flags
 argument to
 .Fn lockmgr
 or
 .Fn lockmgr_rw ,
 the
 .Fa ilk
 must be held prior to calling
 .Fn lockmgr
 or
 .Fn lockmgr_rw ,
 and will be returned unlocked.
 .Pp
 Upgrade attempts that fail result in the loss of the lock that
 is currently held.
 Also, it is invalid to upgrade an
 exclusive lock, and a
 .Xr panic 9
 will be the result of trying.
 .Sh SEE ALSO
 .Xr condvar 9 ,
 .Xr locking 9 ,
 .Xr mtx_assert 9 ,
 .Xr mutex 9 ,
 .Xr panic 9 ,
 .Xr rwlock 9 ,
 .Xr sleep 9 ,
 .Xr sx 9 ,
 .Xr VOP_PRINT 9
 .Sh AUTHORS
 This manual page was written by
 .An Chad David Aq Mt davidc@acns.ab.ca .
Index: user/alc/PQ_LAUNDRY/sys/amd64/amd64/mem.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/amd64/amd64/mem.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/amd64/amd64/mem.c	(revision 303775)
@@ -1,237 +1,239 @@
 /*-
  * Copyright (c) 1988 University of Utah.
  * Copyright (c) 1982, 1986, 1990 The Regents of the University of California.
  * All rights reserved.
  *
  * This code is derived from software contributed to Berkeley by
  * the Systems Programming Group of the University of Utah Computer
  * Science Department, and code derived from software contributed to
  * Berkeley by William Jolitz.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 4. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	from: Utah $Hdr: mem.c 1.13 89/10/08$
  *	from: @(#)mem.c	7.2 (Berkeley) 5/9/91
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 /*
  * Memory special file
  */
 
 #include <sys/param.h>
 #include <sys/conf.h>
 #include <sys/fcntl.h>
 #include <sys/ioccom.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/memrange.h>
 #include <sys/module.h>
 #include <sys/mutex.h>
 #include <sys/proc.h>
 #include <sys/signalvar.h>
 #include <sys/systm.h>
 #include <sys/uio.h>
 
 #include <machine/md_var.h>
 #include <machine/specialreg.h>
 #include <machine/vmparam.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 #include <vm/vm_extern.h>
 
 #include <machine/memdev.h>
 
 /*
  * Used in /dev/mem drivers and elsewhere
  */
 MALLOC_DEFINE(M_MEMDESC, "memdesc", "memory range descriptors");
 
 /* ARGSUSED */
 int
 memrw(struct cdev *dev, struct uio *uio, int flags)
 {
 	struct iovec *iov;
 	void *p;
 	ssize_t orig_resid;
 	u_long v, vd;
 	u_int c;
 	int error;
 
 	error = 0;
 	orig_resid = uio->uio_resid;
 	while (uio->uio_resid > 0 && error == 0) {
 		iov = uio->uio_iov;
 		if (iov->iov_len == 0) {
 			uio->uio_iov++;
 			uio->uio_iovcnt--;
 			if (uio->uio_iovcnt < 0)
 				panic("memrw");
 			continue;
 		}
 		v = uio->uio_offset;
 		c = ulmin(iov->iov_len, PAGE_SIZE - (u_int)(v & PAGE_MASK));
 
 		switch (dev2unit(dev)) {
 		case CDEV_MINOR_KMEM:
 			/*
 			 * Since c is clamped to be less or equal than
 			 * PAGE_SIZE, the uiomove() call does not
 			 * access past the end of the direct map.
 			 */
 			if (v >= DMAP_MIN_ADDRESS &&
 			    v < DMAP_MIN_ADDRESS + dmaplimit) {
 				error = uiomove((void *)v, c, uio);
 				break;
 			}
 
 			if (!kernacc((void *)v, c, uio->uio_rw == UIO_READ ?
 			    VM_PROT_READ : VM_PROT_WRITE)) {
 				error = EFAULT;
 				break;
 			}
 
 			/*
 			 * If the extracted address is not accessible
 			 * through the direct map, then we make a
 			 * private (uncached) mapping because we can't
 			 * depend on the existing kernel mapping
 			 * remaining valid until the completion of
 			 * uiomove().
 			 *
 			 * XXX We cannot provide access to the
 			 * physical page 0 mapped into KVA.
 			 */
 			v = pmap_extract(kernel_pmap, v);
 			if (v == 0) {
 				error = EFAULT;
 				break;
 			}
 			/* FALLTHROUGH */
 		case CDEV_MINOR_MEM:
 			if (v < dmaplimit) {
 				vd = PHYS_TO_DMAP(v);
 				error = uiomove((void *)vd, c, uio);
 				break;
 			}
 			if (v >= (1ULL << cpu_maxphyaddr)) {
 				error = EFAULT;
 				break;
 			}
 			p = pmap_mapdev(v, PAGE_SIZE);
 			error = uiomove(p, c, uio);
 			pmap_unmapdev((vm_offset_t)p, PAGE_SIZE);
 			break;
 		}
 	}
 	/*
 	 * Don't return error if any byte was written.  Read and write
 	 * can return error only if no i/o was performed.
 	 */
 	if (uio->uio_resid != orig_resid)
 		error = 0;
 	return (error);
 }
 
 /*
  * allow user processes to MMAP some memory sections
  * instead of going through read/write
  */
 /* ARGSUSED */
 int
 memmmap(struct cdev *dev, vm_ooffset_t offset, vm_paddr_t *paddr,
     int prot __unused, vm_memattr_t *memattr __unused)
 {
-	if (dev2unit(dev) == CDEV_MINOR_MEM)
+	if (dev2unit(dev) == CDEV_MINOR_MEM) {
+		if (offset >= (1ULL << cpu_maxphyaddr))
+			return (-1);
 		*paddr = offset;
-	else if (dev2unit(dev) == CDEV_MINOR_KMEM)
+	} else if (dev2unit(dev) == CDEV_MINOR_KMEM)
         	*paddr = vtophys(offset);
 	/* else panic! */
 	return (0);
 }
 
 /*
  * Operations for changing memory attributes.
  *
  * This is basically just an ioctl shim for mem_range_attr_get
  * and mem_range_attr_set.
  */
 /* ARGSUSED */
 int 
 memioctl(struct cdev *dev __unused, u_long cmd, caddr_t data, int flags,
     struct thread *td)
 {
 	int nd, error = 0;
 	struct mem_range_op *mo = (struct mem_range_op *)data;
 	struct mem_range_desc *md;
 	
 	/* is this for us? */
 	if ((cmd != MEMRANGE_GET) &&
 	    (cmd != MEMRANGE_SET))
 		return (ENOTTY);
 
 	/* any chance we can handle this? */
 	if (mem_range_softc.mr_op == NULL)
 		return (EOPNOTSUPP);
 
 	/* do we have any descriptors? */
 	if (mem_range_softc.mr_ndesc == 0)
 		return (ENXIO);
 
 	switch (cmd) {
 	case MEMRANGE_GET:
 		nd = imin(mo->mo_arg[0], mem_range_softc.mr_ndesc);
 		if (nd > 0) {
 			md = (struct mem_range_desc *)
 				malloc(nd * sizeof(struct mem_range_desc),
 				       M_MEMDESC, M_WAITOK);
 			error = mem_range_attr_get(md, &nd);
 			if (!error)
 				error = copyout(md, mo->mo_desc, 
 					nd * sizeof(struct mem_range_desc));
 			free(md, M_MEMDESC);
 		}
 		else
 			nd = mem_range_softc.mr_ndesc;
 		mo->mo_arg[0] = nd;
 		break;
 		
 	case MEMRANGE_SET:
 		md = (struct mem_range_desc *)malloc(sizeof(struct mem_range_desc),
 						    M_MEMDESC, M_WAITOK);
 		error = copyin(mo->mo_desc, md, sizeof(struct mem_range_desc));
 		/* clamp description string */
 		md->mr_owner[sizeof(md->mr_owner) - 1] = 0;
 		if (error == 0)
 			error = mem_range_attr_set(md, &mo->mo_arg[0]);
 		free(md, M_MEMDESC);
 		break;
 	}
 	return (error);
 }
Index: user/alc/PQ_LAUNDRY/sys/cddl/compat/opensolaris/sys/vnode.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/cddl/compat/opensolaris/sys/vnode.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/cddl/compat/opensolaris/sys/vnode.h	(revision 303775)
@@ -1,289 +1,287 @@
 /*-
  * Copyright (c) 2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef _OPENSOLARIS_SYS_VNODE_H_
 #define	_OPENSOLARIS_SYS_VNODE_H_
 
 #ifdef _KERNEL
 
 struct vnode;
 struct vattr;
 
 typedef	struct vnode	vnode_t;
 typedef	struct vattr	vattr_t;
 typedef enum vtype vtype_t;
 
 #include <sys/namei.h>
 enum symfollow { NO_FOLLOW = NOFOLLOW };
 
 #include <sys/proc.h>
 #include_next <sys/vnode.h>
 #include <sys/mount.h>
 #include <sys/cred.h>
 #include <sys/fcntl.h>
 #include <sys/file.h>
 #include <sys/filedesc.h>
 #include <sys/syscallsubr.h>
 
 typedef	struct vop_vector	vnodeops_t;
 #define	VOP_FID		VOP_VPTOFH
 #define	vop_fid		vop_vptofh
 #define	vop_fid_args	vop_vptofh_args
 #define	a_fid		a_fhp
 
 #define	IS_XATTRDIR(dvp)	(0)
 
 #define	v_count	v_usecount
 
 #define	V_APPEND	VAPPEND
 
 #define	rootvfs		(rootvnode == NULL ? NULL : rootvnode->v_mount)
 
 static __inline int
 vn_is_readonly(vnode_t *vp)
 {
 	return (vp->v_mount->mnt_flag & MNT_RDONLY);
 }
 #define	vn_vfswlock(vp)		(0)
 #define	vn_vfsunlock(vp)	do { } while (0)
 #define	vn_ismntpt(vp)		((vp)->v_type == VDIR && (vp)->v_mountedhere != NULL)
 #define	vn_mountedvfs(vp)	((vp)->v_mountedhere)
 #define	vn_has_cached_data(vp)	\
 	((vp)->v_object != NULL && \
 	 ((vp)->v_object->resident_page_count > 0 || \
 	  !vm_object_cache_is_empty((vp)->v_object)))
 #define	vn_exists(vp)		do { } while (0)
 #define	vn_invalid(vp)		do { } while (0)
 #define	vn_renamepath(tdvp, svp, tnm, lentnm)	do { } while (0)
 #define	vn_free(vp)		do { } while (0)
 #define	vn_matchops(vp, vops)	((vp)->v_op == &(vops))
 
 #define	VN_HOLD(v)	vref(v)
 #define	VN_RELE(v)	vrele(v)
 #define	VN_URELE(v)	vput(v)
 
-#define	VOP_REALVP(vp, vpp, ct)	(*(vpp) = (vp), 0)
-
 #define	vnevent_create(vp, ct)			do { } while (0)
 #define	vnevent_link(vp, ct)			do { } while (0)
 #define	vnevent_remove(vp, dvp, name, ct)	do { } while (0)
 #define	vnevent_rmdir(vp, dvp, name, ct)	do { } while (0)
 #define	vnevent_rename_src(vp, dvp, name, ct)	do { } while (0)
 #define	vnevent_rename_dest(vp, dvp, name, ct)	do { } while (0)
 #define	vnevent_rename_dest_dir(vp, ct)		do { } while (0)
 
 #define	specvp(vp, rdev, type, cr)	(VN_HOLD(vp), (vp))
 #define	MANDMODE(mode)		(0)
 #define	MANDLOCK(vp, mode)	(0)
 #define	chklock(vp, op, offset, size, mode, ct)	(0)
 #define	cleanlocks(vp, pid, foo)	do { } while (0)
 #define	cleanshares(vp, pid)		do { } while (0)
 
 /*
  * We will use va_spare is place of Solaris' va_mask.
  * This field is initialized in zfs_setattr().
  */
 #define	va_mask		va_spare
 /* TODO: va_fileid is shorter than va_nodeid !!! */
 #define	va_nodeid	va_fileid
 /* TODO: This field needs conversion! */
 #define	va_nblocks	va_bytes
 #define	va_blksize	va_blocksize
 #define	va_seq		va_gen
 
 #define	MAXOFFSET_T	OFF_MAX
 #define	EXCL		0
 
 #define	ACCESSED		(AT_ATIME)
 #define	STATE_CHANGED		(AT_CTIME)
 #define	CONTENT_MODIFIED	(AT_MTIME | AT_CTIME)
 
 static __inline void
 vattr_init_mask(vattr_t *vap)
 {
 
 	vap->va_mask = 0;
 
 	if (vap->va_type != VNON)
 		vap->va_mask |= AT_TYPE;
 	if (vap->va_uid != (uid_t)VNOVAL)
 		vap->va_mask |= AT_UID;
 	if (vap->va_gid != (gid_t)VNOVAL)
 		vap->va_mask |= AT_GID;
 	if (vap->va_size != (u_quad_t)VNOVAL)
 		vap->va_mask |= AT_SIZE;
 	if (vap->va_atime.tv_sec != VNOVAL)
 		vap->va_mask |= AT_ATIME;
 	if (vap->va_mtime.tv_sec != VNOVAL)
 		vap->va_mask |= AT_MTIME;
 	if (vap->va_mode != (u_short)VNOVAL)
 		vap->va_mask |= AT_MODE;
 	if (vap->va_flags != VNOVAL)
 		vap->va_mask |= AT_XVATTR;
 }
 
 #define	FCREAT		O_CREAT
 #define	FTRUNC		O_TRUNC
 #define	FEXCL		O_EXCL
 #define	FDSYNC		FFSYNC
 #define	FRSYNC		FFSYNC
 #define	FSYNC		FFSYNC
 #define	FOFFMAX		0x00
 #define	FIGNORECASE	0x00
 
 static __inline int
 vn_openat(char *pnamep, enum uio_seg seg, int filemode, int createmode,
     vnode_t **vpp, enum create crwhy, mode_t umask, struct vnode *startvp,
     int fd)
 {
 	struct thread *td = curthread;
 	struct nameidata nd;
 	int error, operation;
 
 	ASSERT(seg == UIO_SYSSPACE);
 	if ((filemode & FCREAT) != 0) {
 		ASSERT(filemode == (FWRITE | FCREAT | FTRUNC | FOFFMAX));
 		ASSERT(crwhy == CRCREAT);
 		operation = CREATE;
 	} else {
 		ASSERT(filemode == (FREAD | FOFFMAX) ||
 		    filemode == (FREAD | FWRITE | FOFFMAX));
 		ASSERT(crwhy == 0);
 		operation = LOOKUP;
 	}
 	ASSERT(umask == 0);
 
 	pwd_ensure_dirs();
 
 	if (startvp != NULL)
 		vref(startvp);
 	NDINIT_ATVP(&nd, operation, 0, UIO_SYSSPACE, pnamep, startvp, td);
 	filemode |= O_NOFOLLOW;
 	error = vn_open_cred(&nd, &filemode, createmode, 0, td->td_ucred, NULL);
 	NDFREE(&nd, NDF_ONLY_PNBUF);
 	if (error == 0) {
 		/* We just unlock so we hold a reference. */
 		VOP_UNLOCK(nd.ni_vp, 0);
 		*vpp = nd.ni_vp;
 	}
 	return (error);
 }
 
 static __inline int
 zfs_vn_open(char *pnamep, enum uio_seg seg, int filemode, int createmode,
     vnode_t **vpp, enum create crwhy, mode_t umask)
 {
 
 	return (vn_openat(pnamep, seg, filemode, createmode, vpp, crwhy,
 	    umask, NULL, -1));
 }
 #define	vn_open(pnamep, seg, filemode, createmode, vpp, crwhy, umask)	\
 	zfs_vn_open((pnamep), (seg), (filemode), (createmode), (vpp), (crwhy), (umask))
 
 #define	RLIM64_INFINITY	0
 static __inline int
 zfs_vn_rdwr(enum uio_rw rw, vnode_t *vp, caddr_t base, ssize_t len,
     offset_t offset, enum uio_seg seg, int ioflag, int ulimit, cred_t *cr,
     ssize_t *residp)
 {
 	struct thread *td = curthread;
 	int error;
 	ssize_t resid;
 
 	ASSERT(ioflag == 0);
 	ASSERT(ulimit == RLIM64_INFINITY);
 
 	if (rw == UIO_WRITE) {
 		ioflag = IO_SYNC;
 	} else {
 		ioflag = IO_DIRECT;
 	}
 	error = vn_rdwr(rw, vp, base, len, offset, seg, ioflag, cr, NOCRED,
 	    &resid, td);
 	if (residp != NULL)
 		*residp = (ssize_t)resid;
 	return (error);
 }
 #define	vn_rdwr(rw, vp, base, len, offset, seg, ioflag, ulimit, cr, residp) \
 	zfs_vn_rdwr((rw), (vp), (base), (len), (offset), (seg), (ioflag), (ulimit), (cr), (residp))
 
 static __inline int
 zfs_vop_fsync(vnode_t *vp, int flag, cred_t *cr)
 {
 	struct mount *mp;
 	int error;
 
 	ASSERT(flag == FSYNC);
 
 	if ((error = vn_start_write(vp, &mp, V_WAIT | PCATCH)) != 0)
 		goto drop;
 	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
 	error = VOP_FSYNC(vp, MNT_WAIT, curthread);
 	VOP_UNLOCK(vp, 0);
 	vn_finished_write(mp);
 drop:
 	return (error);
 }
 #define	VOP_FSYNC(vp, flag, cr, ct)	zfs_vop_fsync((vp), (flag), (cr))
 
 static __inline int
 zfs_vop_close(vnode_t *vp, int flag, int count, offset_t offset, cred_t *cr)
 {
 	int error;
 
 	ASSERT(count == 1);
 	ASSERT(offset == 0);
 
 	error = vn_close(vp, flag, cr, curthread);
 	return (error);
 }
 #define	VOP_CLOSE(vp, oflags, count, offset, cr, ct)			\
 	zfs_vop_close((vp), (oflags), (count), (offset), (cr))
 
 static __inline int
 vn_rename(char *from, char *to, enum uio_seg seg)
 {
 
 	ASSERT(seg == UIO_SYSSPACE);
 
 	return (kern_renameat(curthread, AT_FDCWD, from, AT_FDCWD, to, seg));
 }
 
 static __inline int
 vn_remove(char *fnamep, enum uio_seg seg, enum rm dirflag)
 {
 
 	ASSERT(seg == UIO_SYSSPACE);
 	ASSERT(dirflag == RMFILE);
 
 	return (kern_unlinkat(curthread, AT_FDCWD, fnamep, seg, 0));
 }
 
 #endif	/* _KERNEL */
 
 #endif	/* _OPENSOLARIS_SYS_VNODE_H_ */
Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_dir.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_dir.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_dir.h	(revision 303775)
@@ -1,74 +1,74 @@
 /*
  * CDDL HEADER START
  *
  * The contents of this file are subject to the terms of the
  * Common Development and Distribution License (the "License").
  * You may not use this file except in compliance with the License.
  *
  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  * or http://www.opensolaris.org/os/licensing.
  * See the License for the specific language governing permissions
  * and limitations under the License.
  *
  * When distributing Covered Code, include this CDDL HEADER in each
  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  * If applicable, add the following below this CDDL HEADER, with the
  * fields enclosed by brackets "[]" replaced with your own identifying
  * information: Portions Copyright [yyyy] [name of copyright owner]
  *
  * CDDL HEADER END
  */
 /*
  * Copyright 2010 Sun Microsystems, Inc.  All rights reserved.
  * Use is subject to license terms.
  */
 
 #ifndef	_SYS_FS_ZFS_DIR_H
 #define	_SYS_FS_ZFS_DIR_H
 
 #include <sys/pathname.h>
 #include <sys/dmu.h>
 #include <sys/zfs_znode.h>
 
 #ifdef	__cplusplus
 extern "C" {
 #endif
 
 /* zfs_dirent_lock() flags */
 #define	ZNEW		0x0001		/* entry should not exist */
 #define	ZEXISTS		0x0002		/* entry should exist */
 #define	ZSHARED		0x0004		/* shared access (zfs_dirlook()) */
 #define	ZXATTR		0x0008		/* we want the xattr dir */
 #define	ZRENAMING	0x0010		/* znode is being renamed */
 #define	ZCILOOK		0x0020		/* case-insensitive lookup requested */
 #define	ZCIEXACT	0x0040		/* c-i requires c-s match (rename) */
 #define	ZHAVELOCK	0x0080		/* z_name_lock is already held */
 
 /* mknode flags */
 #define	IS_ROOT_NODE	0x01		/* create a root node */
 #define	IS_XATTR	0x02		/* create an extended attribute node */
 
-extern int zfs_dirent_lock(zfs_dirlock_t **, znode_t *, char *, znode_t **,
-    int, int *, pathname_t *);
-extern void zfs_dirent_unlock(zfs_dirlock_t *);
-extern int zfs_link_create(zfs_dirlock_t *, znode_t *, dmu_tx_t *, int);
-extern int zfs_link_destroy(zfs_dirlock_t *, znode_t *, dmu_tx_t *, int,
+extern int zfs_dirent_lookup(znode_t *, const char *, znode_t **, int);
+extern int zfs_link_create(znode_t *, const char *, znode_t *, dmu_tx_t *, int);
+extern int zfs_link_destroy(znode_t *, const char *, znode_t *, dmu_tx_t *, int,
     boolean_t *);
-extern int zfs_dirlook(znode_t *, char *, vnode_t **, int, int *,
-    pathname_t *);
+#if 0
+extern int zfs_dirlook(vnode_t *, const char *, vnode_t **, int);
+#else
+extern int zfs_dirlook(znode_t *, const char *name, znode_t **);
+#endif
 extern void zfs_mknode(znode_t *, vattr_t *, dmu_tx_t *, cred_t *,
     uint_t, znode_t **, zfs_acl_ids_t *);
 extern void zfs_rmnode(znode_t *);
-extern void zfs_dl_name_switch(zfs_dirlock_t *dl, char *new, char **old);
 extern boolean_t zfs_dirempty(znode_t *);
 extern void zfs_unlinked_add(znode_t *, dmu_tx_t *);
 extern void zfs_unlinked_drain(zfsvfs_t *zfsvfs);
 extern int zfs_sticky_remove_access(znode_t *, znode_t *, cred_t *cr);
 extern int zfs_get_xattrdir(znode_t *, vnode_t **, cred_t *, int);
 extern int zfs_make_xattrdir(znode_t *, vattr_t *, vnode_t **, cred_t *);
 
 #ifdef	__cplusplus
 }
 #endif
 
 #endif	/* _SYS_FS_ZFS_DIR_H */
Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_vfsops.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_vfsops.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_vfsops.h	(revision 303775)
@@ -1,168 +1,169 @@
 /*
  * CDDL HEADER START
  *
  * The contents of this file are subject to the terms of the
  * Common Development and Distribution License (the "License").
  * You may not use this file except in compliance with the License.
  *
  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  * or http://www.opensolaris.org/os/licensing.
  * See the License for the specific language governing permissions
  * and limitations under the License.
  *
  * When distributing Covered Code, include this CDDL HEADER in each
  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  * If applicable, add the following below this CDDL HEADER, with the
  * fields enclosed by brackets "[]" replaced with your own identifying
  * information: Portions Copyright [yyyy] [name of copyright owner]
  *
  * CDDL HEADER END
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  * Copyright (c) 2011 Pawel Jakub Dawidek <pawel@dawidek.net>.
  * All rights reserved.
  */
 
 #ifndef	_SYS_FS_ZFS_VFSOPS_H
 #define	_SYS_FS_ZFS_VFSOPS_H
 
 #include <sys/list.h>
 #include <sys/vfs.h>
 #include <sys/zil.h>
 #include <sys/sa.h>
 #include <sys/rrwlock.h>
 #include <sys/zfs_ioctl.h>
 
 #ifdef	__cplusplus
 extern "C" {
 #endif
 
 typedef struct zfsvfs zfsvfs_t;
 struct znode;
 
 struct zfsvfs {
 	vfs_t		*z_vfs;		/* generic fs struct */
 	zfsvfs_t	*z_parent;	/* parent fs */
 	objset_t	*z_os;		/* objset reference */
 	uint64_t	z_root;		/* id of root znode */
 	uint64_t	z_unlinkedobj;	/* id of unlinked zapobj */
 	uint64_t	z_max_blksz;	/* maximum block size for files */
 	uint64_t	z_fuid_obj;	/* fuid table object number */
 	uint64_t	z_fuid_size;	/* fuid table size */
 	avl_tree_t	z_fuid_idx;	/* fuid tree keyed by index */
 	avl_tree_t	z_fuid_domain;	/* fuid tree keyed by domain */
 	krwlock_t	z_fuid_lock;	/* fuid lock */
 	boolean_t	z_fuid_loaded;	/* fuid tables are loaded */
 	boolean_t	z_fuid_dirty;   /* need to sync fuid table ? */
 	struct zfs_fuid_info	*z_fuid_replay; /* fuid info for replay */
 	zilog_t		*z_log;		/* intent log pointer */
 	uint_t		z_acl_mode;	/* acl chmod/mode behavior */
 	uint_t		z_acl_inherit;	/* acl inheritance behavior */
 	zfs_case_t	z_case;		/* case-sense */
 	boolean_t	z_utf8;		/* utf8-only */
 	int		z_norm;		/* normalization flags */
 	boolean_t	z_atime;	/* enable atimes mount option */
 	boolean_t	z_unmounted;	/* unmounted */
 	rrmlock_t	z_teardown_lock;
 	krwlock_t	z_teardown_inactive_lock;
 	list_t		z_all_znodes;	/* all vnodes in the fs */
 	kmutex_t	z_znodes_lock;	/* lock for z_all_znodes */
 	vnode_t		*z_ctldir;	/* .zfs directory pointer */
 	boolean_t	z_show_ctldir;	/* expose .zfs in the root dir */
 	boolean_t	z_issnap;	/* true if this is a snapshot */
 	boolean_t	z_vscan;	/* virus scan on/off */
 	boolean_t	z_use_fuids;	/* version allows fuids */
 	boolean_t	z_replay;	/* set during ZIL replay */
 	boolean_t	z_use_sa;	/* version allow system attributes */
+	boolean_t	z_use_namecache;/* make use of FreeBSD name cache */
 	uint64_t	z_version;	/* ZPL version */
 	uint64_t	z_shares_dir;	/* hidden shares dir */
 	kmutex_t	z_lock;
 	uint64_t	z_userquota_obj;
 	uint64_t	z_groupquota_obj;
 	uint64_t	z_replay_eof;	/* New end of file - replay only */
 	sa_attr_type_t	*z_attr_table;	/* SA attr mapping->id */
 #define	ZFS_OBJ_MTX_SZ	64
 	kmutex_t	z_hold_mtx[ZFS_OBJ_MTX_SZ];	/* znode hold locks */
 };
 
 /*
  * Normal filesystems (those not under .zfs/snapshot) have a total
  * file ID size limited to 12 bytes (including the length field) due to
  * NFSv2 protocol's limitation of 32 bytes for a filehandle.  For historical
  * reasons, this same limit is being imposed by the Solaris NFSv3 implementation
  * (although the NFSv3 protocol actually permits a maximum of 64 bytes).  It
  * is not possible to expand beyond 12 bytes without abandoning support
  * of NFSv2.
  *
  * For normal filesystems, we partition up the available space as follows:
  *	2 bytes		fid length (required)
  *	6 bytes		object number (48 bits)
  *	4 bytes		generation number (32 bits)
  *
  * We reserve only 48 bits for the object number, as this is the limit
  * currently defined and imposed by the DMU.
  */
 typedef struct zfid_short {
 	uint16_t	zf_len;
 	uint8_t		zf_object[6];		/* obj[i] = obj >> (8 * i) */
 	uint8_t		zf_gen[4];		/* gen[i] = gen >> (8 * i) */
 } zfid_short_t;
 
 /*
  * Filesystems under .zfs/snapshot have a total file ID size of 22[*] bytes
  * (including the length field).  This makes files under .zfs/snapshot
  * accessible by NFSv3 and NFSv4, but not NFSv2.
  *
  * For files under .zfs/snapshot, we partition up the available space
  * as follows:
  *	2 bytes		fid length (required)
  *	6 bytes		object number (48 bits)
  *	4 bytes		generation number (32 bits)
  *	6 bytes		objset id (48 bits)
  *	4 bytes[**]	currently just zero (32 bits)
  *
  * We reserve only 48 bits for the object number and objset id, as these are
  * the limits currently defined and imposed by the DMU.
  *
  * [*] 20 bytes on FreeBSD to fit into the size of struct fid.
  * [**] 2 bytes on FreeBSD for the above reason.
  */
 typedef struct zfid_long {
 	zfid_short_t	z_fid;
 	uint8_t		zf_setid[6];		/* obj[i] = obj >> (8 * i) */
 	uint8_t		zf_setgen[2];		/* gen[i] = gen >> (8 * i) */
 } zfid_long_t;
 
 #define	SHORT_FID_LEN	(sizeof (zfid_short_t) - sizeof (uint16_t))
 #define	LONG_FID_LEN	(sizeof (zfid_long_t) - sizeof (uint16_t))
 
 extern uint_t zfs_fsyncer_key;
 extern int zfs_super_owner;
 
 extern int zfs_suspend_fs(zfsvfs_t *zfsvfs);
 extern int zfs_resume_fs(zfsvfs_t *zfsvfs, const char *osname);
 extern int zfs_userspace_one(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
     const char *domain, uint64_t rid, uint64_t *valuep);
 extern int zfs_userspace_many(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
     uint64_t *cookiep, void *vbuf, uint64_t *bufsizep);
 extern int zfs_set_userquota(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
     const char *domain, uint64_t rid, uint64_t quota);
 extern boolean_t zfs_owner_overquota(zfsvfs_t *zfsvfs, struct znode *,
     boolean_t isgroup);
 extern boolean_t zfs_fuid_overquota(zfsvfs_t *zfsvfs, boolean_t isgroup,
     uint64_t fuid);
 extern int zfs_set_version(zfsvfs_t *zfsvfs, uint64_t newvers);
 extern int zfsvfs_create(const char *name, zfsvfs_t **zfvp);
 extern void zfsvfs_free(zfsvfs_t *zfsvfs);
 extern int zfs_check_global_label(const char *dsname, const char *hexsl);
 
 #ifdef _KERNEL
 extern void zfsvfs_update_fromname(const char *oldname, const char *newname);
 #endif
 
 #ifdef	__cplusplus
 }
 #endif
 
 #endif	/* _SYS_FS_ZFS_VFSOPS_H */
Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zfs_znode.h	(revision 303775)
@@ -1,375 +1,377 @@
 /*
  * CDDL HEADER START
  *
  * The contents of this file are subject to the terms of the
  * Common Development and Distribution License (the "License").
  * You may not use this file except in compliance with the License.
  *
  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  * or http://www.opensolaris.org/os/licensing.
  * See the License for the specific language governing permissions
  * and limitations under the License.
  *
  * When distributing Covered Code, include this CDDL HEADER in each
  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  * If applicable, add the following below this CDDL HEADER, with the
  * fields enclosed by brackets "[]" replaced with your own identifying
  * information: Portions Copyright [yyyy] [name of copyright owner]
  *
  * CDDL HEADER END
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  * Copyright (c) 2012 by Delphix. All rights reserved.
  * Copyright (c) 2014 Integros [integros.com]
  */
 
 #ifndef	_SYS_FS_ZFS_ZNODE_H
 #define	_SYS_FS_ZFS_ZNODE_H
 
 #ifdef _KERNEL
 #include <sys/list.h>
 #include <sys/dmu.h>
 #include <sys/sa.h>
 #include <sys/zfs_vfsops.h>
 #include <sys/rrwlock.h>
 #include <sys/zfs_sa.h>
 #include <sys/zfs_stat.h>
 #endif
 #include <sys/zfs_acl.h>
 #include <sys/zil.h>
 
 #ifdef	__cplusplus
 extern "C" {
 #endif
 
 /*
  * Additional file level attributes, that are stored
  * in the upper half of zp_flags
  */
 #define	ZFS_READONLY		0x0000000100000000
 #define	ZFS_HIDDEN		0x0000000200000000
 #define	ZFS_SYSTEM		0x0000000400000000
 #define	ZFS_ARCHIVE		0x0000000800000000
 #define	ZFS_IMMUTABLE		0x0000001000000000
 #define	ZFS_NOUNLINK		0x0000002000000000
 #define	ZFS_APPENDONLY		0x0000004000000000
 #define	ZFS_NODUMP		0x0000008000000000
 #define	ZFS_OPAQUE		0x0000010000000000
 #define	ZFS_AV_QUARANTINED 	0x0000020000000000
 #define	ZFS_AV_MODIFIED 	0x0000040000000000
 #define	ZFS_REPARSE		0x0000080000000000
 #define	ZFS_OFFLINE		0x0000100000000000
 #define	ZFS_SPARSE		0x0000200000000000
 
 #define	ZFS_ATTR_SET(zp, attr, value, pflags, tx) \
 { \
 	if (value) \
 		pflags |= attr; \
 	else \
 		pflags &= ~attr; \
 	VERIFY(0 == sa_update(zp->z_sa_hdl, SA_ZPL_FLAGS(zp->z_zfsvfs), \
 	    &pflags, sizeof (pflags), tx)); \
 }
 
 /*
  * Define special zfs pflags
  */
 #define	ZFS_XATTR		0x1		/* is an extended attribute */
 #define	ZFS_INHERIT_ACE		0x2		/* ace has inheritable ACEs */
 #define	ZFS_ACL_TRIVIAL 	0x4		/* files ACL is trivial */
 #define	ZFS_ACL_OBJ_ACE 	0x8		/* ACL has CMPLX Object ACE */
 #define	ZFS_ACL_PROTECTED	0x10		/* ACL protected */
 #define	ZFS_ACL_DEFAULTED	0x20		/* ACL should be defaulted */
 #define	ZFS_ACL_AUTO_INHERIT	0x40		/* ACL should be inherited */
 #define	ZFS_BONUS_SCANSTAMP	0x80		/* Scanstamp in bonus area */
 #define	ZFS_NO_EXECS_DENIED	0x100		/* exec was given to everyone */
 
 #define	SA_ZPL_ATIME(z)		z->z_attr_table[ZPL_ATIME]
 #define	SA_ZPL_MTIME(z)		z->z_attr_table[ZPL_MTIME]
 #define	SA_ZPL_CTIME(z)		z->z_attr_table[ZPL_CTIME]
 #define	SA_ZPL_CRTIME(z)	z->z_attr_table[ZPL_CRTIME]
 #define	SA_ZPL_GEN(z)		z->z_attr_table[ZPL_GEN]
 #define	SA_ZPL_DACL_ACES(z)	z->z_attr_table[ZPL_DACL_ACES]
 #define	SA_ZPL_XATTR(z)		z->z_attr_table[ZPL_XATTR]
 #define	SA_ZPL_SYMLINK(z)	z->z_attr_table[ZPL_SYMLINK]
 #define	SA_ZPL_RDEV(z)		z->z_attr_table[ZPL_RDEV]
 #define	SA_ZPL_SCANSTAMP(z)	z->z_attr_table[ZPL_SCANSTAMP]
 #define	SA_ZPL_UID(z)		z->z_attr_table[ZPL_UID]
 #define	SA_ZPL_GID(z)		z->z_attr_table[ZPL_GID]
 #define	SA_ZPL_PARENT(z)	z->z_attr_table[ZPL_PARENT]
 #define	SA_ZPL_LINKS(z)		z->z_attr_table[ZPL_LINKS]
 #define	SA_ZPL_MODE(z)		z->z_attr_table[ZPL_MODE]
 #define	SA_ZPL_DACL_COUNT(z)	z->z_attr_table[ZPL_DACL_COUNT]
 #define	SA_ZPL_FLAGS(z)		z->z_attr_table[ZPL_FLAGS]
 #define	SA_ZPL_SIZE(z)		z->z_attr_table[ZPL_SIZE]
 #define	SA_ZPL_ZNODE_ACL(z)	z->z_attr_table[ZPL_ZNODE_ACL]
 #define	SA_ZPL_PAD(z)		z->z_attr_table[ZPL_PAD]
 
 /*
  * Is ID ephemeral?
  */
 #define	IS_EPHEMERAL(x)		(x > MAXUID)
 
 /*
  * Should we use FUIDs?
  */
 #define	USE_FUIDS(version, os)	(version >= ZPL_VERSION_FUID && \
     spa_version(dmu_objset_spa(os)) >= SPA_VERSION_FUID)
 #define	USE_SA(version, os) (version >= ZPL_VERSION_SA && \
     spa_version(dmu_objset_spa(os)) >= SPA_VERSION_SA)
 
 #define	MASTER_NODE_OBJ	1
 
 /*
  * Special attributes for master node.
  * "userquota@" and "groupquota@" are also valid (from
  * zfs_userquota_prop_prefixes[]).
  */
 #define	ZFS_FSID		"FSID"
 #define	ZFS_UNLINKED_SET	"DELETE_QUEUE"
 #define	ZFS_ROOT_OBJ		"ROOT"
 #define	ZPL_VERSION_STR		"VERSION"
 #define	ZFS_FUID_TABLES		"FUID"
 #define	ZFS_SHARES_DIR		"SHARES"
 #define	ZFS_SA_ATTRS		"SA_ATTRS"
 
 /*
  * Path component length
  *
  * The generic fs code uses MAXNAMELEN to represent
  * what the largest component length is.  Unfortunately,
  * this length includes the terminating NULL.  ZFS needs
  * to tell the users via pathconf() and statvfs() what the
  * true maximum length of a component is, excluding the NULL.
  */
 #define	ZFS_MAXNAMELEN	(MAXNAMELEN - 1)
 
 /*
  * Convert mode bits (zp_mode) to BSD-style DT_* values for storing in
  * the directory entries.
  */
 #ifndef IFTODT
 #define	IFTODT(mode) (((mode) & S_IFMT) >> 12)
 #endif
 
 /*
  * The directory entry has the type (currently unused on Solaris) in the
  * top 4 bits, and the object number in the low 48 bits.  The "middle"
  * 12 bits are unused.
  */
 #define	ZFS_DIRENT_TYPE(de) BF64_GET(de, 60, 4)
 #define	ZFS_DIRENT_OBJ(de) BF64_GET(de, 0, 48)
 
 /*
  * Directory entry locks control access to directory entries.
  * They are used to protect creates, deletes, and renames.
  * Each directory znode has a mutex and a list of locked names.
  */
 #ifdef _KERNEL
 typedef struct zfs_dirlock {
 	char		*dl_name;	/* directory entry being locked */
 	uint32_t	dl_sharecnt;	/* 0 if exclusive, > 0 if shared */
 	uint8_t		dl_namelock;	/* 1 if z_name_lock is NOT held */
 	uint16_t	dl_namesize;	/* set if dl_name was allocated */
 	kcondvar_t	dl_cv;		/* wait for entry to be unlocked */
 	struct znode	*dl_dzp;	/* directory znode */
 	struct zfs_dirlock *dl_next;	/* next in z_dirlocks list */
 } zfs_dirlock_t;
 
 typedef struct znode {
 	struct zfsvfs	*z_zfsvfs;
 	vnode_t		*z_vnode;
 	uint64_t	z_id;		/* object ID for this znode */
+#ifdef illumos
 	kmutex_t	z_lock;		/* znode modification lock */
 	krwlock_t	z_parent_lock;	/* parent lock for directories */
 	krwlock_t	z_name_lock;	/* "master" lock for dirent locks */
 	zfs_dirlock_t	*z_dirlocks;	/* directory entry lock list */
+#endif
 	kmutex_t	z_range_lock;	/* protects changes to z_range_avl */
 	avl_tree_t	z_range_avl;	/* avl tree of file range locks */
 	uint8_t		z_unlinked;	/* file has been unlinked */
 	uint8_t		z_atime_dirty;	/* atime needs to be synced */
 	uint8_t		z_zn_prefetch;	/* Prefetch znodes? */
 	uint8_t		z_moved;	/* Has this znode been moved? */
 	uint_t		z_blksz;	/* block size in bytes */
 	uint_t		z_seq;		/* modification sequence number */
 	uint64_t	z_mapcnt;	/* number of pages mapped to file */
 	uint64_t	z_gen;		/* generation (cached) */
 	uint64_t	z_size;		/* file size (cached) */
 	uint64_t	z_atime[2];	/* atime (cached) */
 	uint64_t	z_links;	/* file links (cached) */
 	uint64_t	z_pflags;	/* pflags (cached) */
 	uint64_t	z_uid;		/* uid fuid (cached) */
 	uint64_t	z_gid;		/* gid fuid (cached) */
 	mode_t		z_mode;		/* mode (cached) */
 	uint32_t	z_sync_cnt;	/* synchronous open count */
 	kmutex_t	z_acl_lock;	/* acl data lock */
 	zfs_acl_t	*z_acl_cached;	/* cached acl */
 	list_node_t	z_link_node;	/* all znodes in fs link */
 	sa_handle_t	*z_sa_hdl;	/* handle to sa data */
 	boolean_t	z_is_sa;	/* are we native sa? */
 } znode_t;
 
 
 /*
  * Range locking rules
  * --------------------
  * 1. When truncating a file (zfs_create, zfs_setattr, zfs_space) the whole
  *    file range needs to be locked as RL_WRITER. Only then can the pages be
  *    freed etc and zp_size reset. zp_size must be set within range lock.
  * 2. For writes and punching holes (zfs_write & zfs_space) just the range
  *    being written or freed needs to be locked as RL_WRITER.
  *    Multiple writes at the end of the file must coordinate zp_size updates
  *    to ensure data isn't lost. A compare and swap loop is currently used
  *    to ensure the file size is at least the offset last written.
  * 3. For reads (zfs_read, zfs_get_data & zfs_putapage) just the range being
  *    read needs to be locked as RL_READER. A check against zp_size can then
  *    be made for reading beyond end of file.
  */
 
 /*
  * Convert between znode pointers and vnode pointers
  */
 #ifdef DEBUG
 static __inline vnode_t *
 ZTOV(znode_t *zp)
 {
 	vnode_t *vp = zp->z_vnode;
 
 	ASSERT(vp == NULL || vp->v_data == NULL || vp->v_data == zp);
 	return (vp);
 }
 static __inline znode_t *
 VTOZ(vnode_t *vp)
 {
 	znode_t *zp = (znode_t *)vp->v_data;
 
 	ASSERT(zp == NULL || zp->z_vnode == NULL || zp->z_vnode == vp);
 	return (zp);
 }
 #else
 #define	ZTOV(ZP)	((ZP)->z_vnode)
 #define	VTOZ(VP)	((znode_t *)(VP)->v_data)
 #endif
 
 /* Called on entry to each ZFS vnode and vfs operation  */
 #define	ZFS_ENTER(zfsvfs) \
 	{ \
 		rrm_enter_read(&(zfsvfs)->z_teardown_lock, FTAG); \
 		if ((zfsvfs)->z_unmounted) { \
 			ZFS_EXIT(zfsvfs); \
 			return (EIO); \
 		} \
 	}
 
 /* Must be called before exiting the vop */
 #define	ZFS_EXIT(zfsvfs) rrm_exit(&(zfsvfs)->z_teardown_lock, FTAG)
 
 /* Verifies the znode is valid */
 #define	ZFS_VERIFY_ZP(zp) \
 	if ((zp)->z_sa_hdl == NULL) { \
 		ZFS_EXIT((zp)->z_zfsvfs); \
 		return (EIO); \
 	} \
 
 /*
  * Macros for dealing with dmu_buf_hold
  */
 #define	ZFS_OBJ_HASH(obj_num)	((obj_num) & (ZFS_OBJ_MTX_SZ - 1))
 #define	ZFS_OBJ_MUTEX(zfsvfs, obj_num)	\
 	(&(zfsvfs)->z_hold_mtx[ZFS_OBJ_HASH(obj_num)])
 #define	ZFS_OBJ_HOLD_ENTER(zfsvfs, obj_num) \
 	mutex_enter(ZFS_OBJ_MUTEX((zfsvfs), (obj_num)))
 #define	ZFS_OBJ_HOLD_TRYENTER(zfsvfs, obj_num) \
 	mutex_tryenter(ZFS_OBJ_MUTEX((zfsvfs), (obj_num)))
 #define	ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num) \
 	mutex_exit(ZFS_OBJ_MUTEX((zfsvfs), (obj_num)))
 
 /* Encode ZFS stored time values from a struct timespec */
 #define	ZFS_TIME_ENCODE(tp, stmp)		\
 {						\
 	(stmp)[0] = (uint64_t)(tp)->tv_sec;	\
 	(stmp)[1] = (uint64_t)(tp)->tv_nsec;	\
 }
 
 /* Decode ZFS stored time values to a struct timespec */
 #define	ZFS_TIME_DECODE(tp, stmp)		\
 {						\
 	(tp)->tv_sec = (time_t)(stmp)[0];		\
 	(tp)->tv_nsec = (long)(stmp)[1];		\
 }
 
 /*
  * Timestamp defines
  */
 #define	ACCESSED		(AT_ATIME)
 #define	STATE_CHANGED		(AT_CTIME)
 #define	CONTENT_MODIFIED	(AT_MTIME | AT_CTIME)
 
 #define	ZFS_ACCESSTIME_STAMP(zfsvfs, zp) \
 	if ((zfsvfs)->z_atime && !((zfsvfs)->z_vfs->vfs_flag & VFS_RDONLY)) \
 		zfs_tstamp_update_setup(zp, ACCESSED, NULL, NULL, B_FALSE);
 
 extern int	zfs_init_fs(zfsvfs_t *, znode_t **);
 extern void	zfs_set_dataprop(objset_t *);
 extern void	zfs_create_fs(objset_t *os, cred_t *cr, nvlist_t *,
     dmu_tx_t *tx);
 extern void	zfs_tstamp_update_setup(znode_t *, uint_t, uint64_t [2],
     uint64_t [2], boolean_t);
 extern void	zfs_grow_blocksize(znode_t *, uint64_t, dmu_tx_t *);
 extern int	zfs_freesp(znode_t *, uint64_t, uint64_t, int, boolean_t);
 extern void	zfs_znode_init(void);
 extern void	zfs_znode_fini(void);
 extern int	zfs_zget(zfsvfs_t *, uint64_t, znode_t **);
 extern int	zfs_rezget(znode_t *);
 extern void	zfs_zinactive(znode_t *);
 extern void	zfs_znode_delete(znode_t *, dmu_tx_t *);
 extern void	zfs_znode_free(znode_t *);
 extern void	zfs_remove_op_tables();
 extern int	zfs_create_op_tables();
 extern dev_t	zfs_cmpldev(uint64_t);
 extern int	zfs_get_zplprop(objset_t *os, zfs_prop_t prop, uint64_t *value);
 extern int	zfs_get_stats(objset_t *os, nvlist_t *nv);
 extern void	zfs_znode_dmu_fini(znode_t *);
 
 extern void zfs_log_create(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype,
     znode_t *dzp, znode_t *zp, char *name, vsecattr_t *, zfs_fuid_info_t *,
     vattr_t *vap);
 extern int zfs_log_create_txtype(zil_create_t, vsecattr_t *vsecp,
     vattr_t *vap);
 extern void zfs_log_remove(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype,
     znode_t *dzp, char *name, uint64_t foid);
 #define	ZFS_NO_OBJECT	0	/* no object id */
 extern void zfs_log_link(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype,
     znode_t *dzp, znode_t *zp, char *name);
 extern void zfs_log_symlink(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype,
     znode_t *dzp, znode_t *zp, char *name, char *link);
 extern void zfs_log_rename(zilog_t *zilog, dmu_tx_t *tx, uint64_t txtype,
     znode_t *sdzp, char *sname, znode_t *tdzp, char *dname, znode_t *szp);
 extern void zfs_log_write(zilog_t *zilog, dmu_tx_t *tx, int txtype,
     znode_t *zp, offset_t off, ssize_t len, int ioflag);
 extern void zfs_log_truncate(zilog_t *zilog, dmu_tx_t *tx, int txtype,
     znode_t *zp, uint64_t off, uint64_t len);
 extern void zfs_log_setattr(zilog_t *zilog, dmu_tx_t *tx, int txtype,
     znode_t *zp, vattr_t *vap, uint_t mask_applied, zfs_fuid_info_t *fuidp);
 #ifndef ZFS_NO_ACL
 extern void zfs_log_acl(zilog_t *zilog, dmu_tx_t *tx, znode_t *zp,
     vsecattr_t *vsecp, zfs_fuid_info_t *fuidp);
 #endif
 extern void zfs_xvattr_set(znode_t *zp, xvattr_t *xvap, dmu_tx_t *tx);
 extern void zfs_upgrade(zfsvfs_t *zfsvfs, dmu_tx_t *tx);
 extern int zfs_create_share_dir(zfsvfs_t *zfsvfs, dmu_tx_t *tx);
 
 extern zil_get_data_t zfs_get_data;
 extern zil_replay_func_t *zfs_replay_vector[TX_MAX_TYPE];
 extern int zfsfstype;
 
 #endif /* _KERNEL */
 
 extern int zfs_obj_to_path(objset_t *osp, uint64_t obj, char *buf, int len);
 
 #ifdef	__cplusplus
 }
 #endif
 
 #endif	/* _SYS_FS_ZFS_ZNODE_H */
Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_acl.c	(revision 303775)
@@ -1,2720 +1,2703 @@
 /*
  * CDDL HEADER START
  *
  * The contents of this file are subject to the terms of the
  * Common Development and Distribution License (the "License").
  * You may not use this file except in compliance with the License.
  *
  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  * or http://www.opensolaris.org/os/licensing.
  * See the License for the specific language governing permissions
  * and limitations under the License.
  *
  * When distributing Covered Code, include this CDDL HEADER in each
  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  * If applicable, add the following below this CDDL HEADER, with the
  * fields enclosed by brackets "[]" replaced with your own identifying
  * information: Portions Copyright [yyyy] [name of copyright owner]
  *
  * CDDL HEADER END
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  * Copyright 2011 Nexenta Systems, Inc.  All rights reserved.
  * Copyright (c) 2013 by Delphix. All rights reserved.
  */
 
 #include <sys/types.h>
 #include <sys/param.h>
 #include <sys/time.h>
 #include <sys/systm.h>
 #include <sys/sysmacros.h>
 #include <sys/resource.h>
 #include <sys/vfs.h>
 #include <sys/vnode.h>
 #include <sys/file.h>
 #include <sys/stat.h>
 #include <sys/kmem.h>
 #include <sys/cmn_err.h>
 #include <sys/errno.h>
 #include <sys/unistd.h>
 #include <sys/sdt.h>
 #include <sys/fs/zfs.h>
 #include <sys/policy.h>
 #include <sys/zfs_znode.h>
 #include <sys/zfs_fuid.h>
 #include <sys/zfs_acl.h>
 #include <sys/zfs_dir.h>
 #include <sys/zfs_vfsops.h>
 #include <sys/dmu.h>
 #include <sys/dnode.h>
 #include <sys/zap.h>
 #include <sys/sa.h>
 #include <acl/acl_common.h>
 
 #define	ALLOW	ACE_ACCESS_ALLOWED_ACE_TYPE
 #define	DENY	ACE_ACCESS_DENIED_ACE_TYPE
 #define	MAX_ACE_TYPE	ACE_SYSTEM_ALARM_CALLBACK_OBJECT_ACE_TYPE
 #define	MIN_ACE_TYPE	ALLOW
 
 #define	OWNING_GROUP		(ACE_GROUP|ACE_IDENTIFIER_GROUP)
 #define	EVERYONE_ALLOW_MASK (ACE_READ_ACL|ACE_READ_ATTRIBUTES | \
     ACE_READ_NAMED_ATTRS|ACE_SYNCHRONIZE)
 #define	EVERYONE_DENY_MASK (ACE_WRITE_ACL|ACE_WRITE_OWNER | \
     ACE_WRITE_ATTRIBUTES|ACE_WRITE_NAMED_ATTRS)
 #define	OWNER_ALLOW_MASK (ACE_WRITE_ACL | ACE_WRITE_OWNER | \
     ACE_WRITE_ATTRIBUTES|ACE_WRITE_NAMED_ATTRS)
 
 #define	ZFS_CHECKED_MASKS (ACE_READ_ACL|ACE_READ_ATTRIBUTES|ACE_READ_DATA| \
     ACE_READ_NAMED_ATTRS|ACE_WRITE_DATA|ACE_WRITE_ATTRIBUTES| \
     ACE_WRITE_NAMED_ATTRS|ACE_APPEND_DATA|ACE_EXECUTE|ACE_WRITE_OWNER| \
     ACE_WRITE_ACL|ACE_DELETE|ACE_DELETE_CHILD|ACE_SYNCHRONIZE)
 
 #define	WRITE_MASK_DATA (ACE_WRITE_DATA|ACE_APPEND_DATA|ACE_WRITE_NAMED_ATTRS)
 #define	WRITE_MASK_ATTRS (ACE_WRITE_ACL|ACE_WRITE_OWNER|ACE_WRITE_ATTRIBUTES| \
     ACE_DELETE|ACE_DELETE_CHILD)
 #define	WRITE_MASK (WRITE_MASK_DATA|WRITE_MASK_ATTRS)
 
 #define	OGE_CLEAR	(ACE_READ_DATA|ACE_LIST_DIRECTORY|ACE_WRITE_DATA| \
     ACE_ADD_FILE|ACE_APPEND_DATA|ACE_ADD_SUBDIRECTORY|ACE_EXECUTE)
 
 #define	OKAY_MASK_BITS (ACE_READ_DATA|ACE_LIST_DIRECTORY|ACE_WRITE_DATA| \
     ACE_ADD_FILE|ACE_APPEND_DATA|ACE_ADD_SUBDIRECTORY|ACE_EXECUTE)
 
 #define	ALL_INHERIT	(ACE_FILE_INHERIT_ACE|ACE_DIRECTORY_INHERIT_ACE | \
     ACE_NO_PROPAGATE_INHERIT_ACE|ACE_INHERIT_ONLY_ACE|ACE_INHERITED_ACE)
 
 #define	RESTRICTED_CLEAR	(ACE_WRITE_ACL|ACE_WRITE_OWNER)
 
 #define	V4_ACL_WIDE_FLAGS (ZFS_ACL_AUTO_INHERIT|ZFS_ACL_DEFAULTED|\
     ZFS_ACL_PROTECTED)
 
 #define	ZFS_ACL_WIDE_FLAGS (V4_ACL_WIDE_FLAGS|ZFS_ACL_TRIVIAL|ZFS_INHERIT_ACE|\
     ZFS_ACL_OBJ_ACE)
 
 #define	ALL_MODE_EXECS (S_IXUSR | S_IXGRP | S_IXOTH)
 
 static uint16_t
 zfs_ace_v0_get_type(void *acep)
 {
 	return (((zfs_oldace_t *)acep)->z_type);
 }
 
 static uint16_t
 zfs_ace_v0_get_flags(void *acep)
 {
 	return (((zfs_oldace_t *)acep)->z_flags);
 }
 
 static uint32_t
 zfs_ace_v0_get_mask(void *acep)
 {
 	return (((zfs_oldace_t *)acep)->z_access_mask);
 }
 
 static uint64_t
 zfs_ace_v0_get_who(void *acep)
 {
 	return (((zfs_oldace_t *)acep)->z_fuid);
 }
 
 static void
 zfs_ace_v0_set_type(void *acep, uint16_t type)
 {
 	((zfs_oldace_t *)acep)->z_type = type;
 }
 
 static void
 zfs_ace_v0_set_flags(void *acep, uint16_t flags)
 {
 	((zfs_oldace_t *)acep)->z_flags = flags;
 }
 
 static void
 zfs_ace_v0_set_mask(void *acep, uint32_t mask)
 {
 	((zfs_oldace_t *)acep)->z_access_mask = mask;
 }
 
 static void
 zfs_ace_v0_set_who(void *acep, uint64_t who)
 {
 	((zfs_oldace_t *)acep)->z_fuid = who;
 }
 
 /*ARGSUSED*/
 static size_t
 zfs_ace_v0_size(void *acep)
 {
 	return (sizeof (zfs_oldace_t));
 }
 
 static size_t
 zfs_ace_v0_abstract_size(void)
 {
 	return (sizeof (zfs_oldace_t));
 }
 
 static int
 zfs_ace_v0_mask_off(void)
 {
 	return (offsetof(zfs_oldace_t, z_access_mask));
 }
 
 /*ARGSUSED*/
 static int
 zfs_ace_v0_data(void *acep, void **datap)
 {
 	*datap = NULL;
 	return (0);
 }
 
 static acl_ops_t zfs_acl_v0_ops = {
 	zfs_ace_v0_get_mask,
 	zfs_ace_v0_set_mask,
 	zfs_ace_v0_get_flags,
 	zfs_ace_v0_set_flags,
 	zfs_ace_v0_get_type,
 	zfs_ace_v0_set_type,
 	zfs_ace_v0_get_who,
 	zfs_ace_v0_set_who,
 	zfs_ace_v0_size,
 	zfs_ace_v0_abstract_size,
 	zfs_ace_v0_mask_off,
 	zfs_ace_v0_data
 };
 
 static uint16_t
 zfs_ace_fuid_get_type(void *acep)
 {
 	return (((zfs_ace_hdr_t *)acep)->z_type);
 }
 
 static uint16_t
 zfs_ace_fuid_get_flags(void *acep)
 {
 	return (((zfs_ace_hdr_t *)acep)->z_flags);
 }
 
 static uint32_t
 zfs_ace_fuid_get_mask(void *acep)
 {
 	return (((zfs_ace_hdr_t *)acep)->z_access_mask);
 }
 
 static uint64_t
 zfs_ace_fuid_get_who(void *args)
 {
 	uint16_t entry_type;
 	zfs_ace_t *acep = args;
 
 	entry_type = acep->z_hdr.z_flags & ACE_TYPE_FLAGS;
 
 	if (entry_type == ACE_OWNER || entry_type == OWNING_GROUP ||
 	    entry_type == ACE_EVERYONE)
 		return (-1);
 	return (((zfs_ace_t *)acep)->z_fuid);
 }
 
 static void
 zfs_ace_fuid_set_type(void *acep, uint16_t type)
 {
 	((zfs_ace_hdr_t *)acep)->z_type = type;
 }
 
 static void
 zfs_ace_fuid_set_flags(void *acep, uint16_t flags)
 {
 	((zfs_ace_hdr_t *)acep)->z_flags = flags;
 }
 
 static void
 zfs_ace_fuid_set_mask(void *acep, uint32_t mask)
 {
 	((zfs_ace_hdr_t *)acep)->z_access_mask = mask;
 }
 
 static void
 zfs_ace_fuid_set_who(void *arg, uint64_t who)
 {
 	zfs_ace_t *acep = arg;
 
 	uint16_t entry_type = acep->z_hdr.z_flags & ACE_TYPE_FLAGS;
 
 	if (entry_type == ACE_OWNER || entry_type == OWNING_GROUP ||
 	    entry_type == ACE_EVERYONE)
 		return;
 	acep->z_fuid = who;
 }
 
 static size_t
 zfs_ace_fuid_size(void *acep)
 {
 	zfs_ace_hdr_t *zacep = acep;
 	uint16_t entry_type;
 
 	switch (zacep->z_type) {
 	case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE:
 	case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE:
 	case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE:
 	case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE:
 		return (sizeof (zfs_object_ace_t));
 	case ALLOW:
 	case DENY:
 		entry_type =
 		    (((zfs_ace_hdr_t *)acep)->z_flags & ACE_TYPE_FLAGS);
 		if (entry_type == ACE_OWNER ||
 		    entry_type == OWNING_GROUP ||
 		    entry_type == ACE_EVERYONE)
 			return (sizeof (zfs_ace_hdr_t));
 		/*FALLTHROUGH*/
 	default:
 		return (sizeof (zfs_ace_t));
 	}
 }
 
 static size_t
 zfs_ace_fuid_abstract_size(void)
 {
 	return (sizeof (zfs_ace_hdr_t));
 }
 
 static int
 zfs_ace_fuid_mask_off(void)
 {
 	return (offsetof(zfs_ace_hdr_t, z_access_mask));
 }
 
 static int
 zfs_ace_fuid_data(void *acep, void **datap)
 {
 	zfs_ace_t *zacep = acep;
 	zfs_object_ace_t *zobjp;
 
 	switch (zacep->z_hdr.z_type) {
 	case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE:
 	case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE:
 	case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE:
 	case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE:
 		zobjp = acep;
 		*datap = (caddr_t)zobjp + sizeof (zfs_ace_t);
 		return (sizeof (zfs_object_ace_t) - sizeof (zfs_ace_t));
 	default:
 		*datap = NULL;
 		return (0);
 	}
 }
 
 static acl_ops_t zfs_acl_fuid_ops = {
 	zfs_ace_fuid_get_mask,
 	zfs_ace_fuid_set_mask,
 	zfs_ace_fuid_get_flags,
 	zfs_ace_fuid_set_flags,
 	zfs_ace_fuid_get_type,
 	zfs_ace_fuid_set_type,
 	zfs_ace_fuid_get_who,
 	zfs_ace_fuid_set_who,
 	zfs_ace_fuid_size,
 	zfs_ace_fuid_abstract_size,
 	zfs_ace_fuid_mask_off,
 	zfs_ace_fuid_data
 };
 
 /*
  * The following three functions are provided for compatibility with
  * older ZPL version in order to determine if the file use to have
  * an external ACL and what version of ACL previously existed on the
  * file.  Would really be nice to not need this, sigh.
  */
 uint64_t
 zfs_external_acl(znode_t *zp)
 {
 	zfs_acl_phys_t acl_phys;
 	int error;
 
 	if (zp->z_is_sa)
 		return (0);
 
 	/*
 	 * Need to deal with a potential
 	 * race where zfs_sa_upgrade could cause
 	 * z_isa_sa to change.
 	 *
 	 * If the lookup fails then the state of z_is_sa should have
 	 * changed.
 	 */
 
 	if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_ZNODE_ACL(zp->z_zfsvfs),
 	    &acl_phys, sizeof (acl_phys))) == 0)
 		return (acl_phys.z_acl_extern_obj);
 	else {
 		/*
 		 * after upgrade the SA_ZPL_ZNODE_ACL should have been
 		 * removed
 		 */
 		VERIFY(zp->z_is_sa && error == ENOENT);
 		return (0);
 	}
 }
 
 /*
  * Determine size of ACL in bytes
  *
  * This is more complicated than it should be since we have to deal
  * with old external ACLs.
  */
 static int
 zfs_acl_znode_info(znode_t *zp, int *aclsize, int *aclcount,
     zfs_acl_phys_t *aclphys)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	uint64_t acl_count;
 	int size;
 	int error;
 
 	ASSERT(MUTEX_HELD(&zp->z_acl_lock));
 	if (zp->z_is_sa) {
 		if ((error = sa_size(zp->z_sa_hdl, SA_ZPL_DACL_ACES(zfsvfs),
 		    &size)) != 0)
 			return (error);
 		*aclsize = size;
 		if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_DACL_COUNT(zfsvfs),
 		    &acl_count, sizeof (acl_count))) != 0)
 			return (error);
 		*aclcount = acl_count;
 	} else {
 		if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_ZNODE_ACL(zfsvfs),
 		    aclphys, sizeof (*aclphys))) != 0)
 			return (error);
 
 		if (aclphys->z_acl_version == ZFS_ACL_VERSION_INITIAL) {
 			*aclsize = ZFS_ACL_SIZE(aclphys->z_acl_size);
 			*aclcount = aclphys->z_acl_size;
 		} else {
 			*aclsize = aclphys->z_acl_size;
 			*aclcount = aclphys->z_acl_count;
 		}
 	}
 	return (0);
 }
 
 int
 zfs_znode_acl_version(znode_t *zp)
 {
 	zfs_acl_phys_t acl_phys;
 
 	if (zp->z_is_sa)
 		return (ZFS_ACL_VERSION_FUID);
 	else {
 		int error;
 
 		/*
 		 * Need to deal with a potential
 		 * race where zfs_sa_upgrade could cause
 		 * z_isa_sa to change.
 		 *
 		 * If the lookup fails then the state of z_is_sa should have
 		 * changed.
 		 */
 		if ((error = sa_lookup(zp->z_sa_hdl,
 		    SA_ZPL_ZNODE_ACL(zp->z_zfsvfs),
 		    &acl_phys, sizeof (acl_phys))) == 0)
 			return (acl_phys.z_acl_version);
 		else {
 			/*
 			 * After upgrade SA_ZPL_ZNODE_ACL should have
 			 * been removed.
 			 */
 			VERIFY(zp->z_is_sa && error == ENOENT);
 			return (ZFS_ACL_VERSION_FUID);
 		}
 	}
 }
 
 static int
 zfs_acl_version(int version)
 {
 	if (version < ZPL_VERSION_FUID)
 		return (ZFS_ACL_VERSION_INITIAL);
 	else
 		return (ZFS_ACL_VERSION_FUID);
 }
 
 static int
 zfs_acl_version_zp(znode_t *zp)
 {
 	return (zfs_acl_version(zp->z_zfsvfs->z_version));
 }
 
 zfs_acl_t *
 zfs_acl_alloc(int vers)
 {
 	zfs_acl_t *aclp;
 
 	aclp = kmem_zalloc(sizeof (zfs_acl_t), KM_SLEEP);
 	list_create(&aclp->z_acl, sizeof (zfs_acl_node_t),
 	    offsetof(zfs_acl_node_t, z_next));
 	aclp->z_version = vers;
 	if (vers == ZFS_ACL_VERSION_FUID)
 		aclp->z_ops = zfs_acl_fuid_ops;
 	else
 		aclp->z_ops = zfs_acl_v0_ops;
 	return (aclp);
 }
 
 zfs_acl_node_t *
 zfs_acl_node_alloc(size_t bytes)
 {
 	zfs_acl_node_t *aclnode;
 
 	aclnode = kmem_zalloc(sizeof (zfs_acl_node_t), KM_SLEEP);
 	if (bytes) {
 		aclnode->z_acldata = kmem_alloc(bytes, KM_SLEEP);
 		aclnode->z_allocdata = aclnode->z_acldata;
 		aclnode->z_allocsize = bytes;
 		aclnode->z_size = bytes;
 	}
 
 	return (aclnode);
 }
 
 static void
 zfs_acl_node_free(zfs_acl_node_t *aclnode)
 {
 	if (aclnode->z_allocsize)
 		kmem_free(aclnode->z_allocdata, aclnode->z_allocsize);
 	kmem_free(aclnode, sizeof (zfs_acl_node_t));
 }
 
 static void
 zfs_acl_release_nodes(zfs_acl_t *aclp)
 {
 	zfs_acl_node_t *aclnode;
 
 	while (aclnode = list_head(&aclp->z_acl)) {
 		list_remove(&aclp->z_acl, aclnode);
 		zfs_acl_node_free(aclnode);
 	}
 	aclp->z_acl_count = 0;
 	aclp->z_acl_bytes = 0;
 }
 
 void
 zfs_acl_free(zfs_acl_t *aclp)
 {
 	zfs_acl_release_nodes(aclp);
 	list_destroy(&aclp->z_acl);
 	kmem_free(aclp, sizeof (zfs_acl_t));
 }
 
 static boolean_t
 zfs_acl_valid_ace_type(uint_t type, uint_t flags)
 {
 	uint16_t entry_type;
 
 	switch (type) {
 	case ALLOW:
 	case DENY:
 	case ACE_SYSTEM_AUDIT_ACE_TYPE:
 	case ACE_SYSTEM_ALARM_ACE_TYPE:
 		entry_type = flags & ACE_TYPE_FLAGS;
 		return (entry_type == ACE_OWNER ||
 		    entry_type == OWNING_GROUP ||
 		    entry_type == ACE_EVERYONE || entry_type == 0 ||
 		    entry_type == ACE_IDENTIFIER_GROUP);
 	default:
 		if (type >= MIN_ACE_TYPE && type <= MAX_ACE_TYPE)
 			return (B_TRUE);
 	}
 	return (B_FALSE);
 }
 
 static boolean_t
 zfs_ace_valid(vtype_t obj_type, zfs_acl_t *aclp, uint16_t type, uint16_t iflags)
 {
 	/*
 	 * first check type of entry
 	 */
 
 	if (!zfs_acl_valid_ace_type(type, iflags))
 		return (B_FALSE);
 
 	switch (type) {
 	case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE:
 	case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE:
 	case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE:
 	case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE:
 		if (aclp->z_version < ZFS_ACL_VERSION_FUID)
 			return (B_FALSE);
 		aclp->z_hints |= ZFS_ACL_OBJ_ACE;
 	}
 
 	/*
 	 * next check inheritance level flags
 	 */
 
 	if (obj_type == VDIR &&
 	    (iflags & (ACE_FILE_INHERIT_ACE|ACE_DIRECTORY_INHERIT_ACE)))
 		aclp->z_hints |= ZFS_INHERIT_ACE;
 
 	if (iflags & (ACE_INHERIT_ONLY_ACE|ACE_NO_PROPAGATE_INHERIT_ACE)) {
 		if ((iflags & (ACE_FILE_INHERIT_ACE|
 		    ACE_DIRECTORY_INHERIT_ACE)) == 0) {
 			return (B_FALSE);
 		}
 	}
 
 	return (B_TRUE);
 }
 
 static void *
 zfs_acl_next_ace(zfs_acl_t *aclp, void *start, uint64_t *who,
     uint32_t *access_mask, uint16_t *iflags, uint16_t *type)
 {
 	zfs_acl_node_t *aclnode;
 
 	ASSERT(aclp);
 
 	if (start == NULL) {
 		aclnode = list_head(&aclp->z_acl);
 		if (aclnode == NULL)
 			return (NULL);
 
 		aclp->z_next_ace = aclnode->z_acldata;
 		aclp->z_curr_node = aclnode;
 		aclnode->z_ace_idx = 0;
 	}
 
 	aclnode = aclp->z_curr_node;
 
 	if (aclnode == NULL)
 		return (NULL);
 
 	if (aclnode->z_ace_idx >= aclnode->z_ace_count) {
 		aclnode = list_next(&aclp->z_acl, aclnode);
 		if (aclnode == NULL)
 			return (NULL);
 		else {
 			aclp->z_curr_node = aclnode;
 			aclnode->z_ace_idx = 0;
 			aclp->z_next_ace = aclnode->z_acldata;
 		}
 	}
 
 	if (aclnode->z_ace_idx < aclnode->z_ace_count) {
 		void *acep = aclp->z_next_ace;
 		size_t ace_size;
 
 		/*
 		 * Make sure we don't overstep our bounds
 		 */
 		ace_size = aclp->z_ops.ace_size(acep);
 
 		if (((caddr_t)acep + ace_size) >
 		    ((caddr_t)aclnode->z_acldata + aclnode->z_size)) {
 			return (NULL);
 		}
 
 		*iflags = aclp->z_ops.ace_flags_get(acep);
 		*type = aclp->z_ops.ace_type_get(acep);
 		*access_mask = aclp->z_ops.ace_mask_get(acep);
 		*who = aclp->z_ops.ace_who_get(acep);
 		aclp->z_next_ace = (caddr_t)aclp->z_next_ace + ace_size;
 		aclnode->z_ace_idx++;
 
 		return ((void *)acep);
 	}
 	return (NULL);
 }
 
 /*ARGSUSED*/
 static uint64_t
 zfs_ace_walk(void *datap, uint64_t cookie, int aclcnt,
     uint16_t *flags, uint16_t *type, uint32_t *mask)
 {
 	zfs_acl_t *aclp = datap;
 	zfs_ace_hdr_t *acep = (zfs_ace_hdr_t *)(uintptr_t)cookie;
 	uint64_t who;
 
 	acep = zfs_acl_next_ace(aclp, acep, &who, mask,
 	    flags, type);
 	return ((uint64_t)(uintptr_t)acep);
 }
 
 static zfs_acl_node_t *
 zfs_acl_curr_node(zfs_acl_t *aclp)
 {
 	ASSERT(aclp->z_curr_node);
 	return (aclp->z_curr_node);
 }
 
 /*
  * Copy ACE to internal ZFS format.
  * While processing the ACL each ACE will be validated for correctness.
  * ACE FUIDs will be created later.
  */
 int
 zfs_copy_ace_2_fuid(zfsvfs_t *zfsvfs, vtype_t obj_type, zfs_acl_t *aclp,
     void *datap, zfs_ace_t *z_acl, uint64_t aclcnt, size_t *size,
     zfs_fuid_info_t **fuidp, cred_t *cr)
 {
 	int i;
 	uint16_t entry_type;
 	zfs_ace_t *aceptr = z_acl;
 	ace_t *acep = datap;
 	zfs_object_ace_t *zobjacep;
 	ace_object_t *aceobjp;
 
 	for (i = 0; i != aclcnt; i++) {
 		aceptr->z_hdr.z_access_mask = acep->a_access_mask;
 		aceptr->z_hdr.z_flags = acep->a_flags;
 		aceptr->z_hdr.z_type = acep->a_type;
 		entry_type = aceptr->z_hdr.z_flags & ACE_TYPE_FLAGS;
 		if (entry_type != ACE_OWNER && entry_type != OWNING_GROUP &&
 		    entry_type != ACE_EVERYONE) {
 			aceptr->z_fuid = zfs_fuid_create(zfsvfs, acep->a_who,
 			    cr, (entry_type == 0) ?
 			    ZFS_ACE_USER : ZFS_ACE_GROUP, fuidp);
 		}
 
 		/*
 		 * Make sure ACE is valid
 		 */
 		if (zfs_ace_valid(obj_type, aclp, aceptr->z_hdr.z_type,
 		    aceptr->z_hdr.z_flags) != B_TRUE)
 			return (SET_ERROR(EINVAL));
 
 		switch (acep->a_type) {
 		case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE:
 		case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE:
 		case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE:
 		case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE:
 			zobjacep = (zfs_object_ace_t *)aceptr;
 			aceobjp = (ace_object_t *)acep;
 
 			bcopy(aceobjp->a_obj_type, zobjacep->z_object_type,
 			    sizeof (aceobjp->a_obj_type));
 			bcopy(aceobjp->a_inherit_obj_type,
 			    zobjacep->z_inherit_type,
 			    sizeof (aceobjp->a_inherit_obj_type));
 			acep = (ace_t *)((caddr_t)acep + sizeof (ace_object_t));
 			break;
 		default:
 			acep = (ace_t *)((caddr_t)acep + sizeof (ace_t));
 		}
 
 		aceptr = (zfs_ace_t *)((caddr_t)aceptr +
 		    aclp->z_ops.ace_size(aceptr));
 	}
 
 	*size = (caddr_t)aceptr - (caddr_t)z_acl;
 
 	return (0);
 }
 
 /*
  * Copy ZFS ACEs to fixed size ace_t layout
  */
 static void
 zfs_copy_fuid_2_ace(zfsvfs_t *zfsvfs, zfs_acl_t *aclp, cred_t *cr,
     void *datap, int filter)
 {
 	uint64_t who;
 	uint32_t access_mask;
 	uint16_t iflags, type;
 	zfs_ace_hdr_t *zacep = NULL;
 	ace_t *acep = datap;
 	ace_object_t *objacep;
 	zfs_object_ace_t *zobjacep;
 	size_t ace_size;
 	uint16_t entry_type;
 
 	while (zacep = zfs_acl_next_ace(aclp, zacep,
 	    &who, &access_mask, &iflags, &type)) {
 
 		switch (type) {
 		case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE:
 		case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE:
 		case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE:
 		case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE:
 			if (filter) {
 				continue;
 			}
 			zobjacep = (zfs_object_ace_t *)zacep;
 			objacep = (ace_object_t *)acep;
 			bcopy(zobjacep->z_object_type,
 			    objacep->a_obj_type,
 			    sizeof (zobjacep->z_object_type));
 			bcopy(zobjacep->z_inherit_type,
 			    objacep->a_inherit_obj_type,
 			    sizeof (zobjacep->z_inherit_type));
 			ace_size = sizeof (ace_object_t);
 			break;
 		default:
 			ace_size = sizeof (ace_t);
 			break;
 		}
 
 		entry_type = (iflags & ACE_TYPE_FLAGS);
 		if ((entry_type != ACE_OWNER &&
 		    entry_type != OWNING_GROUP &&
 		    entry_type != ACE_EVERYONE)) {
 			acep->a_who = zfs_fuid_map_id(zfsvfs, who,
 			    cr, (entry_type & ACE_IDENTIFIER_GROUP) ?
 			    ZFS_ACE_GROUP : ZFS_ACE_USER);
 		} else {
 			acep->a_who = (uid_t)(int64_t)who;
 		}
 		acep->a_access_mask = access_mask;
 		acep->a_flags = iflags;
 		acep->a_type = type;
 		acep = (ace_t *)((caddr_t)acep + ace_size);
 	}
 }
 
 static int
 zfs_copy_ace_2_oldace(vtype_t obj_type, zfs_acl_t *aclp, ace_t *acep,
     zfs_oldace_t *z_acl, int aclcnt, size_t *size)
 {
 	int i;
 	zfs_oldace_t *aceptr = z_acl;
 
 	for (i = 0; i != aclcnt; i++, aceptr++) {
 		aceptr->z_access_mask = acep[i].a_access_mask;
 		aceptr->z_type = acep[i].a_type;
 		aceptr->z_flags = acep[i].a_flags;
 		aceptr->z_fuid = acep[i].a_who;
 		/*
 		 * Make sure ACE is valid
 		 */
 		if (zfs_ace_valid(obj_type, aclp, aceptr->z_type,
 		    aceptr->z_flags) != B_TRUE)
 			return (SET_ERROR(EINVAL));
 	}
 	*size = (caddr_t)aceptr - (caddr_t)z_acl;
 	return (0);
 }
 
 /*
  * convert old ACL format to new
  */
 void
 zfs_acl_xform(znode_t *zp, zfs_acl_t *aclp, cred_t *cr)
 {
 	zfs_oldace_t *oldaclp;
 	int i;
 	uint16_t type, iflags;
 	uint32_t access_mask;
 	uint64_t who;
 	void *cookie = NULL;
 	zfs_acl_node_t *newaclnode;
 
 	ASSERT(aclp->z_version == ZFS_ACL_VERSION_INITIAL);
 	/*
 	 * First create the ACE in a contiguous piece of memory
 	 * for zfs_copy_ace_2_fuid().
 	 *
 	 * We only convert an ACL once, so this won't happen
 	 * everytime.
 	 */
 	oldaclp = kmem_alloc(sizeof (zfs_oldace_t) * aclp->z_acl_count,
 	    KM_SLEEP);
 	i = 0;
 	while (cookie = zfs_acl_next_ace(aclp, cookie, &who,
 	    &access_mask, &iflags, &type)) {
 		oldaclp[i].z_flags = iflags;
 		oldaclp[i].z_type = type;
 		oldaclp[i].z_fuid = who;
 		oldaclp[i++].z_access_mask = access_mask;
 	}
 
 	newaclnode = zfs_acl_node_alloc(aclp->z_acl_count *
 	    sizeof (zfs_object_ace_t));
 	aclp->z_ops = zfs_acl_fuid_ops;
 	VERIFY(zfs_copy_ace_2_fuid(zp->z_zfsvfs, ZTOV(zp)->v_type, aclp,
 	    oldaclp, newaclnode->z_acldata, aclp->z_acl_count,
 	    &newaclnode->z_size, NULL, cr) == 0);
 	newaclnode->z_ace_count = aclp->z_acl_count;
 	aclp->z_version = ZFS_ACL_VERSION;
 	kmem_free(oldaclp, aclp->z_acl_count * sizeof (zfs_oldace_t));
 
 	/*
 	 * Release all previous ACL nodes
 	 */
 
 	zfs_acl_release_nodes(aclp);
 
 	list_insert_head(&aclp->z_acl, newaclnode);
 
 	aclp->z_acl_bytes = newaclnode->z_size;
 	aclp->z_acl_count = newaclnode->z_ace_count;
 
 }
 
 /*
  * Convert unix access mask to v4 access mask
  */
 static uint32_t
 zfs_unix_to_v4(uint32_t access_mask)
 {
 	uint32_t new_mask = 0;
 
 	if (access_mask & S_IXOTH)
 		new_mask |= ACE_EXECUTE;
 	if (access_mask & S_IWOTH)
 		new_mask |= ACE_WRITE_DATA;
 	if (access_mask & S_IROTH)
 		new_mask |= ACE_READ_DATA;
 	return (new_mask);
 }
 
 static void
 zfs_set_ace(zfs_acl_t *aclp, void *acep, uint32_t access_mask,
     uint16_t access_type, uint64_t fuid, uint16_t entry_type)
 {
 	uint16_t type = entry_type & ACE_TYPE_FLAGS;
 
 	aclp->z_ops.ace_mask_set(acep, access_mask);
 	aclp->z_ops.ace_type_set(acep, access_type);
 	aclp->z_ops.ace_flags_set(acep, entry_type);
 	if ((type != ACE_OWNER && type != OWNING_GROUP &&
 	    type != ACE_EVERYONE))
 		aclp->z_ops.ace_who_set(acep, fuid);
 }
 
 /*
  * Determine mode of file based on ACL.
  */
 uint64_t
 zfs_mode_compute(uint64_t fmode, zfs_acl_t *aclp,
     uint64_t *pflags, uint64_t fuid, uint64_t fgid)
 {
 	int		entry_type;
 	mode_t		mode;
 	mode_t		seen = 0;
 	zfs_ace_hdr_t 	*acep = NULL;
 	uint64_t	who;
 	uint16_t	iflags, type;
 	uint32_t	access_mask;
 	boolean_t	an_exec_denied = B_FALSE;
 
 	mode = (fmode & (S_IFMT | S_ISUID | S_ISGID | S_ISVTX));
 
 	while (acep = zfs_acl_next_ace(aclp, acep, &who,
 	    &access_mask, &iflags, &type)) {
 
 		if (!zfs_acl_valid_ace_type(type, iflags))
 			continue;
 
 		entry_type = (iflags & ACE_TYPE_FLAGS);
 
 		/*
 		 * Skip over any inherit_only ACEs
 		 */
 		if (iflags & ACE_INHERIT_ONLY_ACE)
 			continue;
 
 		if (entry_type == ACE_OWNER || (entry_type == 0 &&
 		    who == fuid)) {
 			if ((access_mask & ACE_READ_DATA) &&
 			    (!(seen & S_IRUSR))) {
 				seen |= S_IRUSR;
 				if (type == ALLOW) {
 					mode |= S_IRUSR;
 				}
 			}
 			if ((access_mask & ACE_WRITE_DATA) &&
 			    (!(seen & S_IWUSR))) {
 				seen |= S_IWUSR;
 				if (type == ALLOW) {
 					mode |= S_IWUSR;
 				}
 			}
 			if ((access_mask & ACE_EXECUTE) &&
 			    (!(seen & S_IXUSR))) {
 				seen |= S_IXUSR;
 				if (type == ALLOW) {
 					mode |= S_IXUSR;
 				}
 			}
 		} else if (entry_type == OWNING_GROUP ||
 		    (entry_type == ACE_IDENTIFIER_GROUP && who == fgid)) {
 			if ((access_mask & ACE_READ_DATA) &&
 			    (!(seen & S_IRGRP))) {
 				seen |= S_IRGRP;
 				if (type == ALLOW) {
 					mode |= S_IRGRP;
 				}
 			}
 			if ((access_mask & ACE_WRITE_DATA) &&
 			    (!(seen & S_IWGRP))) {
 				seen |= S_IWGRP;
 				if (type == ALLOW) {
 					mode |= S_IWGRP;
 				}
 			}
 			if ((access_mask & ACE_EXECUTE) &&
 			    (!(seen & S_IXGRP))) {
 				seen |= S_IXGRP;
 				if (type == ALLOW) {
 					mode |= S_IXGRP;
 				}
 			}
 		} else if (entry_type == ACE_EVERYONE) {
 			if ((access_mask & ACE_READ_DATA)) {
 				if (!(seen & S_IRUSR)) {
 					seen |= S_IRUSR;
 					if (type == ALLOW) {
 						mode |= S_IRUSR;
 					}
 				}
 				if (!(seen & S_IRGRP)) {
 					seen |= S_IRGRP;
 					if (type == ALLOW) {
 						mode |= S_IRGRP;
 					}
 				}
 				if (!(seen & S_IROTH)) {
 					seen |= S_IROTH;
 					if (type == ALLOW) {
 						mode |= S_IROTH;
 					}
 				}
 			}
 			if ((access_mask & ACE_WRITE_DATA)) {
 				if (!(seen & S_IWUSR)) {
 					seen |= S_IWUSR;
 					if (type == ALLOW) {
 						mode |= S_IWUSR;
 					}
 				}
 				if (!(seen & S_IWGRP)) {
 					seen |= S_IWGRP;
 					if (type == ALLOW) {
 						mode |= S_IWGRP;
 					}
 				}
 				if (!(seen & S_IWOTH)) {
 					seen |= S_IWOTH;
 					if (type == ALLOW) {
 						mode |= S_IWOTH;
 					}
 				}
 			}
 			if ((access_mask & ACE_EXECUTE)) {
 				if (!(seen & S_IXUSR)) {
 					seen |= S_IXUSR;
 					if (type == ALLOW) {
 						mode |= S_IXUSR;
 					}
 				}
 				if (!(seen & S_IXGRP)) {
 					seen |= S_IXGRP;
 					if (type == ALLOW) {
 						mode |= S_IXGRP;
 					}
 				}
 				if (!(seen & S_IXOTH)) {
 					seen |= S_IXOTH;
 					if (type == ALLOW) {
 						mode |= S_IXOTH;
 					}
 				}
 			}
 		} else {
 			/*
 			 * Only care if this IDENTIFIER_GROUP or
 			 * USER ACE denies execute access to someone,
 			 * mode is not affected
 			 */
 			if ((access_mask & ACE_EXECUTE) && type == DENY)
 				an_exec_denied = B_TRUE;
 		}
 	}
 
 	/*
 	 * Failure to allow is effectively a deny, so execute permission
 	 * is denied if it was never mentioned or if we explicitly
 	 * weren't allowed it.
 	 */
 	if (!an_exec_denied &&
 	    ((seen & ALL_MODE_EXECS) != ALL_MODE_EXECS ||
 	    (mode & ALL_MODE_EXECS) != ALL_MODE_EXECS))
 		an_exec_denied = B_TRUE;
 
 	if (an_exec_denied)
 		*pflags &= ~ZFS_NO_EXECS_DENIED;
 	else
 		*pflags |= ZFS_NO_EXECS_DENIED;
 
 	return (mode);
 }
 
 /*
  * Read an external acl object.  If the intent is to modify, always
  * create a new acl and leave any cached acl in place.
  */
 static int
-zfs_acl_node_read(znode_t *zp, boolean_t have_lock, zfs_acl_t **aclpp,
-    boolean_t will_modify)
+zfs_acl_node_read(znode_t *zp, zfs_acl_t **aclpp, boolean_t will_modify)
 {
 	zfs_acl_t	*aclp;
 	int		aclsize;
 	int		acl_count;
 	zfs_acl_node_t	*aclnode;
 	zfs_acl_phys_t	znode_acl;
 	int		version;
 	int		error;
-	boolean_t	drop_lock = B_FALSE;
 
 	ASSERT(MUTEX_HELD(&zp->z_acl_lock));
+	ASSERT_VOP_LOCKED(ZTOV(zp), __func__);
 
 	if (zp->z_acl_cached && !will_modify) {
 		*aclpp = zp->z_acl_cached;
 		return (0);
 	}
 
-	/*
-	 * close race where znode could be upgrade while trying to
-	 * read the znode attributes.
-	 *
-	 * But this could only happen if the file isn't already an SA
-	 * znode
-	 */
-	if (!zp->z_is_sa && !have_lock) {
-		mutex_enter(&zp->z_lock);
-		drop_lock = B_TRUE;
-	}
 	version = zfs_znode_acl_version(zp);
 
 	if ((error = zfs_acl_znode_info(zp, &aclsize,
 	    &acl_count, &znode_acl)) != 0) {
 		goto done;
 	}
 
 	aclp = zfs_acl_alloc(version);
 
 	aclp->z_acl_count = acl_count;
 	aclp->z_acl_bytes = aclsize;
 
 	aclnode = zfs_acl_node_alloc(aclsize);
 	aclnode->z_ace_count = aclp->z_acl_count;
 	aclnode->z_size = aclsize;
 
 	if (!zp->z_is_sa) {
 		if (znode_acl.z_acl_extern_obj) {
 			error = dmu_read(zp->z_zfsvfs->z_os,
 			    znode_acl.z_acl_extern_obj, 0, aclnode->z_size,
 			    aclnode->z_acldata, DMU_READ_PREFETCH);
 		} else {
 			bcopy(znode_acl.z_ace_data, aclnode->z_acldata,
 			    aclnode->z_size);
 		}
 	} else {
 		error = sa_lookup(zp->z_sa_hdl, SA_ZPL_DACL_ACES(zp->z_zfsvfs),
 		    aclnode->z_acldata, aclnode->z_size);
 	}
 
 	if (error != 0) {
 		zfs_acl_free(aclp);
 		zfs_acl_node_free(aclnode);
 		/* convert checksum errors into IO errors */
 		if (error == ECKSUM)
 			error = SET_ERROR(EIO);
 		goto done;
 	}
 
 	list_insert_head(&aclp->z_acl, aclnode);
 
 	*aclpp = aclp;
 	if (!will_modify)
 		zp->z_acl_cached = aclp;
 done:
-	if (drop_lock)
-		mutex_exit(&zp->z_lock);
 	return (error);
 }
 
 /*ARGSUSED*/
 void
 zfs_acl_data_locator(void **dataptr, uint32_t *length, uint32_t buflen,
     boolean_t start, void *userdata)
 {
 	zfs_acl_locator_cb_t *cb = (zfs_acl_locator_cb_t *)userdata;
 
 	if (start) {
 		cb->cb_acl_node = list_head(&cb->cb_aclp->z_acl);
 	} else {
 		cb->cb_acl_node = list_next(&cb->cb_aclp->z_acl,
 		    cb->cb_acl_node);
 	}
 	*dataptr = cb->cb_acl_node->z_acldata;
 	*length = cb->cb_acl_node->z_size;
 }
 
 int
 zfs_acl_chown_setattr(znode_t *zp)
 {
 	int error;
 	zfs_acl_t *aclp;
 
-	ASSERT(MUTEX_HELD(&zp->z_lock));
+	ASSERT_VOP_ELOCKED(ZTOV(zp), __func__);
 	ASSERT(MUTEX_HELD(&zp->z_acl_lock));
 
-	if ((error = zfs_acl_node_read(zp, B_TRUE, &aclp, B_FALSE)) == 0)
+	if ((error = zfs_acl_node_read(zp, &aclp, B_FALSE)) == 0)
 		zp->z_mode = zfs_mode_compute(zp->z_mode, aclp,
 		    &zp->z_pflags, zp->z_uid, zp->z_gid);
 	return (error);
 }
 
 /*
  * common code for setting ACLs.
  *
  * This function is called from zfs_mode_update, zfs_perm_init, and zfs_setacl.
  * zfs_setacl passes a non-NULL inherit pointer (ihp) to indicate that it's
  * already checked the acl and knows whether to inherit.
  */
 int
 zfs_aclset_common(znode_t *zp, zfs_acl_t *aclp, cred_t *cr, dmu_tx_t *tx)
 {
 	int			error;
 	zfsvfs_t		*zfsvfs = zp->z_zfsvfs;
 	dmu_object_type_t	otype;
 	zfs_acl_locator_cb_t	locate = { 0 };
 	uint64_t		mode;
 	sa_bulk_attr_t		bulk[5];
 	uint64_t		ctime[2];
 	int			count = 0;
 
 	mode = zp->z_mode;
 
 	mode = zfs_mode_compute(mode, aclp, &zp->z_pflags,
 	    zp->z_uid, zp->z_gid);
 
 	zp->z_mode = mode;
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zfsvfs), NULL,
 	    &mode, sizeof (mode));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL,
 	    &zp->z_pflags, sizeof (zp->z_pflags));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL,
 	    &ctime, sizeof (ctime));
 
 	if (zp->z_acl_cached) {
 		zfs_acl_free(zp->z_acl_cached);
 		zp->z_acl_cached = NULL;
 	}
 
 	/*
 	 * Upgrade needed?
 	 */
 	if (!zfsvfs->z_use_fuids) {
 		otype = DMU_OT_OLDACL;
 	} else {
 		if ((aclp->z_version == ZFS_ACL_VERSION_INITIAL) &&
 		    (zfsvfs->z_version >= ZPL_VERSION_FUID))
 			zfs_acl_xform(zp, aclp, cr);
 		ASSERT(aclp->z_version >= ZFS_ACL_VERSION_FUID);
 		otype = DMU_OT_ACL;
 	}
 
 	/*
 	 * Arrgh, we have to handle old on disk format
 	 * as well as newer (preferred) SA format.
 	 */
 
 	if (zp->z_is_sa) { /* the easy case, just update the ACL attribute */
 		locate.cb_aclp = aclp;
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_DACL_ACES(zfsvfs),
 		    zfs_acl_data_locator, &locate, aclp->z_acl_bytes);
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_DACL_COUNT(zfsvfs),
 		    NULL, &aclp->z_acl_count, sizeof (uint64_t));
 	} else { /* Painful legacy way */
 		zfs_acl_node_t *aclnode;
 		uint64_t off = 0;
 		zfs_acl_phys_t acl_phys;
 		uint64_t aoid;
 
 		if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_ZNODE_ACL(zfsvfs),
 		    &acl_phys, sizeof (acl_phys))) != 0)
 			return (error);
 
 		aoid = acl_phys.z_acl_extern_obj;
 
 		if (aclp->z_acl_bytes > ZFS_ACE_SPACE) {
 			/*
 			 * If ACL was previously external and we are now
 			 * converting to new ACL format then release old
 			 * ACL object and create a new one.
 			 */
 			if (aoid &&
 			    aclp->z_version != acl_phys.z_acl_version) {
 				error = dmu_object_free(zfsvfs->z_os, aoid, tx);
 				if (error)
 					return (error);
 				aoid = 0;
 			}
 			if (aoid == 0) {
 				aoid = dmu_object_alloc(zfsvfs->z_os,
 				    otype, aclp->z_acl_bytes,
 				    otype == DMU_OT_ACL ?
 				    DMU_OT_SYSACL : DMU_OT_NONE,
 				    otype == DMU_OT_ACL ?
 				    DN_MAX_BONUSLEN : 0, tx);
 			} else {
 				(void) dmu_object_set_blocksize(zfsvfs->z_os,
 				    aoid, aclp->z_acl_bytes, 0, tx);
 			}
 			acl_phys.z_acl_extern_obj = aoid;
 			for (aclnode = list_head(&aclp->z_acl); aclnode;
 			    aclnode = list_next(&aclp->z_acl, aclnode)) {
 				if (aclnode->z_ace_count == 0)
 					continue;
 				dmu_write(zfsvfs->z_os, aoid, off,
 				    aclnode->z_size, aclnode->z_acldata, tx);
 				off += aclnode->z_size;
 			}
 		} else {
 			void *start = acl_phys.z_ace_data;
 			/*
 			 * Migrating back embedded?
 			 */
 			if (acl_phys.z_acl_extern_obj) {
 				error = dmu_object_free(zfsvfs->z_os,
 				    acl_phys.z_acl_extern_obj, tx);
 				if (error)
 					return (error);
 				acl_phys.z_acl_extern_obj = 0;
 			}
 
 			for (aclnode = list_head(&aclp->z_acl); aclnode;
 			    aclnode = list_next(&aclp->z_acl, aclnode)) {
 				if (aclnode->z_ace_count == 0)
 					continue;
 				bcopy(aclnode->z_acldata, start,
 				    aclnode->z_size);
 				start = (caddr_t)start + aclnode->z_size;
 			}
 		}
 		/*
 		 * If Old version then swap count/bytes to match old
 		 * layout of znode_acl_phys_t.
 		 */
 		if (aclp->z_version == ZFS_ACL_VERSION_INITIAL) {
 			acl_phys.z_acl_size = aclp->z_acl_count;
 			acl_phys.z_acl_count = aclp->z_acl_bytes;
 		} else {
 			acl_phys.z_acl_size = aclp->z_acl_bytes;
 			acl_phys.z_acl_count = aclp->z_acl_count;
 		}
 		acl_phys.z_acl_version = aclp->z_version;
 
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ZNODE_ACL(zfsvfs), NULL,
 		    &acl_phys, sizeof (acl_phys));
 	}
 
 	/*
 	 * Replace ACL wide bits, but first clear them.
 	 */
 	zp->z_pflags &= ~ZFS_ACL_WIDE_FLAGS;
 
 	zp->z_pflags |= aclp->z_hints;
 
 	if (ace_trivial_common(aclp, 0, zfs_ace_walk) == 0)
 		zp->z_pflags |= ZFS_ACL_TRIVIAL;
 
 	zfs_tstamp_update_setup(zp, STATE_CHANGED, NULL, ctime, B_TRUE);
 	return (sa_bulk_update(zp->z_sa_hdl, bulk, count, tx));
 }
 
 static void
 zfs_acl_chmod(vtype_t vtype, uint64_t mode, boolean_t split, boolean_t trim,
     zfs_acl_t *aclp)
 {
 	void		*acep = NULL;
 	uint64_t	who;
 	int		new_count, new_bytes;
 	int		ace_size;
 	int 		entry_type;
 	uint16_t	iflags, type;
 	uint32_t	access_mask;
 	zfs_acl_node_t	*newnode;
 	size_t 		abstract_size = aclp->z_ops.ace_abstract_size();
 	void 		*zacep;
 	boolean_t	isdir;
 	trivial_acl_t	masks;
 
 	new_count = new_bytes = 0;
 
 	isdir = (vtype == VDIR);
 
 	acl_trivial_access_masks((mode_t)mode, isdir, &masks);
 
 	newnode = zfs_acl_node_alloc((abstract_size * 6) + aclp->z_acl_bytes);
 
 	zacep = newnode->z_acldata;
 	if (masks.allow0) {
 		zfs_set_ace(aclp, zacep, masks.allow0, ALLOW, -1, ACE_OWNER);
 		zacep = (void *)((uintptr_t)zacep + abstract_size);
 		new_count++;
 		new_bytes += abstract_size;
 	}
 	if (masks.deny1) {
 		zfs_set_ace(aclp, zacep, masks.deny1, DENY, -1, ACE_OWNER);
 		zacep = (void *)((uintptr_t)zacep + abstract_size);
 		new_count++;
 		new_bytes += abstract_size;
 	}
 	if (masks.deny2) {
 		zfs_set_ace(aclp, zacep, masks.deny2, DENY, -1, OWNING_GROUP);
 		zacep = (void *)((uintptr_t)zacep + abstract_size);
 		new_count++;
 		new_bytes += abstract_size;
 	}
 
 	while (acep = zfs_acl_next_ace(aclp, acep, &who, &access_mask,
 	    &iflags, &type)) {
 		entry_type = (iflags & ACE_TYPE_FLAGS);
 		/*
 		 * ACEs used to represent the file mode may be divided
 		 * into an equivalent pair of inherit-only and regular
 		 * ACEs, if they are inheritable.
 		 * Skip regular ACEs, which are replaced by the new mode.
 		 */
 		if (split && (entry_type == ACE_OWNER ||
 		    entry_type == OWNING_GROUP ||
 		    entry_type == ACE_EVERYONE)) {
 			if (!isdir || !(iflags &
 			    (ACE_FILE_INHERIT_ACE|ACE_DIRECTORY_INHERIT_ACE)))
 				continue;
 			/*
 			 * We preserve owner@, group@, or @everyone
 			 * permissions, if they are inheritable, by
 			 * copying them to inherit_only ACEs. This
 			 * prevents inheritable permissions from being
 			 * altered along with the file mode.
 			 */
 			iflags |= ACE_INHERIT_ONLY_ACE;
 		}
 
 		/*
 		 * If this ACL has any inheritable ACEs, mark that in
 		 * the hints (which are later masked into the pflags)
 		 * so create knows to do inheritance.
 		 */
 		if (isdir && (iflags &
 		    (ACE_FILE_INHERIT_ACE|ACE_DIRECTORY_INHERIT_ACE)))
 			aclp->z_hints |= ZFS_INHERIT_ACE;
 
 		if ((type != ALLOW && type != DENY) ||
 		    (iflags & ACE_INHERIT_ONLY_ACE)) {
 			switch (type) {
 			case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE:
 			case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE:
 			case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE:
 			case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE:
 				aclp->z_hints |= ZFS_ACL_OBJ_ACE;
 				break;
 			}
 		} else {
 			/*
 			 * Limit permissions granted by ACEs to be no greater
 			 * than permissions of the requested group mode.
 			 * Applies when the "aclmode" property is set to
 			 * "groupmask".
 			 */
 			if ((type == ALLOW) && trim)
 				access_mask &= masks.group;
 		}
 		zfs_set_ace(aclp, zacep, access_mask, type, who, iflags);
 		ace_size = aclp->z_ops.ace_size(acep);
 		zacep = (void *)((uintptr_t)zacep + ace_size);
 		new_count++;
 		new_bytes += ace_size;
 	}
 	zfs_set_ace(aclp, zacep, masks.owner, ALLOW, -1, ACE_OWNER);
 	zacep = (void *)((uintptr_t)zacep + abstract_size);
 	zfs_set_ace(aclp, zacep, masks.group, ALLOW, -1, OWNING_GROUP);
 	zacep = (void *)((uintptr_t)zacep + abstract_size);
 	zfs_set_ace(aclp, zacep, masks.everyone, ALLOW, -1, ACE_EVERYONE);
 
 	new_count += 3;
 	new_bytes += abstract_size * 3;
 	zfs_acl_release_nodes(aclp);
 	aclp->z_acl_count = new_count;
 	aclp->z_acl_bytes = new_bytes;
 	newnode->z_ace_count = new_count;
 	newnode->z_size = new_bytes;
 	list_insert_tail(&aclp->z_acl, newnode);
 }
 
 int
 zfs_acl_chmod_setattr(znode_t *zp, zfs_acl_t **aclp, uint64_t mode)
 {
 	int error = 0;
 
 	mutex_enter(&zp->z_acl_lock);
-	mutex_enter(&zp->z_lock);
+	ASSERT_VOP_ELOCKED(ZTOV(zp), __func__);
 	if (zp->z_zfsvfs->z_acl_mode == ZFS_ACL_DISCARD)
 		*aclp = zfs_acl_alloc(zfs_acl_version_zp(zp));
 	else
-		error = zfs_acl_node_read(zp, B_TRUE, aclp, B_TRUE);
+		error = zfs_acl_node_read(zp, aclp, B_TRUE);
 
 	if (error == 0) {
 		(*aclp)->z_hints = zp->z_pflags & V4_ACL_WIDE_FLAGS;
 		zfs_acl_chmod(ZTOV(zp)->v_type, mode, B_TRUE,
 		    (zp->z_zfsvfs->z_acl_mode == ZFS_ACL_GROUPMASK), *aclp);
 	}
-	mutex_exit(&zp->z_lock);
 	mutex_exit(&zp->z_acl_lock);
 
 	return (error);
 }
 
 /*
  * Should ACE be inherited?
  */
 static int
 zfs_ace_can_use(vtype_t vtype, uint16_t acep_flags)
 {
 	int	iflags = (acep_flags & 0xf);
 
 	if ((vtype == VDIR) && (iflags & ACE_DIRECTORY_INHERIT_ACE))
 		return (1);
 	else if (iflags & ACE_FILE_INHERIT_ACE)
 		return (!((vtype == VDIR) &&
 		    (iflags & ACE_NO_PROPAGATE_INHERIT_ACE)));
 	return (0);
 }
 
 /*
  * inherit inheritable ACEs from parent
  */
 static zfs_acl_t *
 zfs_acl_inherit(zfsvfs_t *zfsvfs, vtype_t vtype, zfs_acl_t *paclp,
     uint64_t mode)
 {
 	void		*pacep = NULL;
 	void		*acep;
 	zfs_acl_node_t  *aclnode;
 	zfs_acl_t	*aclp = NULL;
 	uint64_t	who;
 	uint32_t	access_mask;
 	uint16_t	iflags, newflags, type;
 	size_t		ace_size;
 	void		*data1, *data2;
 	size_t		data1sz, data2sz;
 	uint_t		aclinherit;
 	boolean_t	isdir = (vtype == VDIR);
 
 	aclp = zfs_acl_alloc(paclp->z_version);
 	aclinherit = zfsvfs->z_acl_inherit;
 	if (aclinherit == ZFS_ACL_DISCARD || vtype == VLNK)
 		return (aclp);
 
 	while (pacep = zfs_acl_next_ace(paclp, pacep, &who,
 	    &access_mask, &iflags, &type)) {
 
 		/*
 		 * don't inherit bogus ACEs
 		 */
 		if (!zfs_acl_valid_ace_type(type, iflags))
 			continue;
 
 		/*
 		 * Check if ACE is inheritable by this vnode
 		 */
 		if ((aclinherit == ZFS_ACL_NOALLOW && type == ALLOW) ||
 		    !zfs_ace_can_use(vtype, iflags))
 			continue;
 
 		/*
 		 * Strip inherited execute permission from file if
 		 * not in mode
 		 */
 		if (aclinherit == ZFS_ACL_PASSTHROUGH_X && type == ALLOW &&
 		    !isdir && ((mode & (S_IXUSR|S_IXGRP|S_IXOTH)) == 0)) {
 			access_mask &= ~ACE_EXECUTE;
 		}
 
 		/*
 		 * Strip write_acl and write_owner from permissions
 		 * when inheriting an ACE
 		 */
 		if (aclinherit == ZFS_ACL_RESTRICTED && type == ALLOW) {
 			access_mask &= ~RESTRICTED_CLEAR;
 		}
 
 		ace_size = aclp->z_ops.ace_size(pacep);
 		aclnode = zfs_acl_node_alloc(ace_size);
 		list_insert_tail(&aclp->z_acl, aclnode);
 		acep = aclnode->z_acldata;
 
 		zfs_set_ace(aclp, acep, access_mask, type,
 		    who, iflags|ACE_INHERITED_ACE);
 
 		/*
 		 * Copy special opaque data if any
 		 */
 		if ((data1sz = paclp->z_ops.ace_data(pacep, &data1)) != 0) {
 			VERIFY((data2sz = aclp->z_ops.ace_data(acep,
 			    &data2)) == data1sz);
 			bcopy(data1, data2, data2sz);
 		}
 
 		aclp->z_acl_count++;
 		aclnode->z_ace_count++;
 		aclp->z_acl_bytes += aclnode->z_size;
 		newflags = aclp->z_ops.ace_flags_get(acep);
 
 		/*
 		 * If ACE is not to be inherited further, or if the vnode is
 		 * not a directory, remove all inheritance flags
 		 */
 		if (!isdir || (iflags & ACE_NO_PROPAGATE_INHERIT_ACE)) {
 			newflags &= ~ALL_INHERIT;
 			aclp->z_ops.ace_flags_set(acep,
 			    newflags|ACE_INHERITED_ACE);
 			continue;
 		}
 
 		/*
 		 * This directory has an inheritable ACE
 		 */
 		aclp->z_hints |= ZFS_INHERIT_ACE;
 
 		/*
 		 * If only FILE_INHERIT is set then turn on
 		 * inherit_only
 		 */
 		if ((iflags & (ACE_FILE_INHERIT_ACE |
 		    ACE_DIRECTORY_INHERIT_ACE)) == ACE_FILE_INHERIT_ACE) {
 			newflags |= ACE_INHERIT_ONLY_ACE;
 			aclp->z_ops.ace_flags_set(acep,
 			    newflags|ACE_INHERITED_ACE);
 		} else {
 			newflags &= ~ACE_INHERIT_ONLY_ACE;
 			aclp->z_ops.ace_flags_set(acep,
 			    newflags|ACE_INHERITED_ACE);
 		}
 	}
 
 	return (aclp);
 }
 
 /*
  * Create file system object initial permissions
  * including inheritable ACEs.
  * Also, create FUIDs for owner and group.
  */
 int
 zfs_acl_ids_create(znode_t *dzp, int flag, vattr_t *vap, cred_t *cr,
     vsecattr_t *vsecp, zfs_acl_ids_t *acl_ids)
 {
 	int		error;
 	zfsvfs_t	*zfsvfs = dzp->z_zfsvfs;
 	zfs_acl_t	*paclp;
 	gid_t		gid;
 	boolean_t	trim = B_FALSE;
 	boolean_t	inherited = B_FALSE;
 
+	ASSERT_VOP_ELOCKED(ZTOV(dzp), __func__);
 	bzero(acl_ids, sizeof (zfs_acl_ids_t));
 	acl_ids->z_mode = MAKEIMODE(vap->va_type, vap->va_mode);
 
 	if (vsecp)
 		if ((error = zfs_vsec_2_aclp(zfsvfs, vap->va_type, vsecp, cr,
 		    &acl_ids->z_fuidp, &acl_ids->z_aclp)) != 0)
 			return (error);
 	/*
 	 * Determine uid and gid.
 	 */
 	if ((flag & IS_ROOT_NODE) || zfsvfs->z_replay ||
 	    ((flag & IS_XATTR) && (vap->va_type == VDIR))) {
 		acl_ids->z_fuid = zfs_fuid_create(zfsvfs,
 		    (uint64_t)vap->va_uid, cr,
 		    ZFS_OWNER, &acl_ids->z_fuidp);
 		acl_ids->z_fgid = zfs_fuid_create(zfsvfs,
 		    (uint64_t)vap->va_gid, cr,
 		    ZFS_GROUP, &acl_ids->z_fuidp);
 		gid = vap->va_gid;
 	} else {
 		acl_ids->z_fuid = zfs_fuid_create_cred(zfsvfs, ZFS_OWNER,
 		    cr, &acl_ids->z_fuidp);
 		acl_ids->z_fgid = 0;
 		if (vap->va_mask & AT_GID)  {
 			acl_ids->z_fgid = zfs_fuid_create(zfsvfs,
 			    (uint64_t)vap->va_gid,
 			    cr, ZFS_GROUP, &acl_ids->z_fuidp);
 			gid = vap->va_gid;
 			if (acl_ids->z_fgid != dzp->z_gid &&
 			    !groupmember(vap->va_gid, cr) &&
 			    secpolicy_vnode_create_gid(cr) != 0)
 				acl_ids->z_fgid = 0;
 		}
 		if (acl_ids->z_fgid == 0) {
 			if (dzp->z_mode & S_ISGID) {
 				char		*domain;
 				uint32_t	rid;
 
 				acl_ids->z_fgid = dzp->z_gid;
 				gid = zfs_fuid_map_id(zfsvfs, acl_ids->z_fgid,
 				    cr, ZFS_GROUP);
 
 				if (zfsvfs->z_use_fuids &&
 				    IS_EPHEMERAL(acl_ids->z_fgid)) {
 					domain = zfs_fuid_idx_domain(
 					    &zfsvfs->z_fuid_idx,
 					    FUID_INDEX(acl_ids->z_fgid));
 					rid = FUID_RID(acl_ids->z_fgid);
 					zfs_fuid_node_add(&acl_ids->z_fuidp,
 					    domain, rid,
 					    FUID_INDEX(acl_ids->z_fgid),
 					    acl_ids->z_fgid, ZFS_GROUP);
 				}
 			} else {
 				acl_ids->z_fgid = zfs_fuid_create_cred(zfsvfs,
 				    ZFS_GROUP, cr, &acl_ids->z_fuidp);
 #ifdef __FreeBSD_kernel__
 				gid = acl_ids->z_fgid = dzp->z_gid;
 #else
 				gid = crgetgid(cr);
 #endif
 			}
 		}
 	}
 
 	/*
 	 * If we're creating a directory, and the parent directory has the
 	 * set-GID bit set, set in on the new directory.
 	 * Otherwise, if the user is neither privileged nor a member of the
 	 * file's new group, clear the file's set-GID bit.
 	 */
 
 	if (!(flag & IS_ROOT_NODE) && (dzp->z_mode & S_ISGID) &&
 	    (vap->va_type == VDIR)) {
 		acl_ids->z_mode |= S_ISGID;
 	} else {
 		if ((acl_ids->z_mode & S_ISGID) &&
 		    secpolicy_vnode_setids_setgids(ZTOV(dzp), cr, gid) != 0)
 			acl_ids->z_mode &= ~S_ISGID;
 	}
 
 	if (acl_ids->z_aclp == NULL) {
 		mutex_enter(&dzp->z_acl_lock);
-		mutex_enter(&dzp->z_lock);
 		if (!(flag & IS_ROOT_NODE) &&
 		    (dzp->z_pflags & ZFS_INHERIT_ACE) &&
 		    !(dzp->z_pflags & ZFS_XATTR)) {
-			VERIFY(0 == zfs_acl_node_read(dzp, B_TRUE,
-			    &paclp, B_FALSE));
+			VERIFY(0 == zfs_acl_node_read(dzp, &paclp, B_FALSE));
 			acl_ids->z_aclp = zfs_acl_inherit(zfsvfs,
 			    vap->va_type, paclp, acl_ids->z_mode);
 			inherited = B_TRUE;
 		} else {
 			acl_ids->z_aclp =
 			    zfs_acl_alloc(zfs_acl_version_zp(dzp));
 			acl_ids->z_aclp->z_hints |= ZFS_ACL_TRIVIAL;
 		}
-		mutex_exit(&dzp->z_lock);
 		mutex_exit(&dzp->z_acl_lock);
 
 		if (vap->va_type == VDIR)
 			acl_ids->z_aclp->z_hints |= ZFS_ACL_AUTO_INHERIT;
 
 		if (zfsvfs->z_acl_mode == ZFS_ACL_GROUPMASK &&
 		    zfsvfs->z_acl_inherit != ZFS_ACL_PASSTHROUGH &&
 		    zfsvfs->z_acl_inherit != ZFS_ACL_PASSTHROUGH_X)
 			trim = B_TRUE;
 		zfs_acl_chmod(vap->va_type, acl_ids->z_mode, B_FALSE, trim,
 		    acl_ids->z_aclp);
 	}
 
 	if (inherited || vsecp) {
 		acl_ids->z_mode = zfs_mode_compute(acl_ids->z_mode,
 		    acl_ids->z_aclp, &acl_ids->z_aclp->z_hints,
 		    acl_ids->z_fuid, acl_ids->z_fgid);
 		if (ace_trivial_common(acl_ids->z_aclp, 0, zfs_ace_walk) == 0)
 			acl_ids->z_aclp->z_hints |= ZFS_ACL_TRIVIAL;
 	}
 
 	return (0);
 }
 
 /*
  * Free ACL and fuid_infop, but not the acl_ids structure
  */
 void
 zfs_acl_ids_free(zfs_acl_ids_t *acl_ids)
 {
 	if (acl_ids->z_aclp)
 		zfs_acl_free(acl_ids->z_aclp);
 	if (acl_ids->z_fuidp)
 		zfs_fuid_info_free(acl_ids->z_fuidp);
 	acl_ids->z_aclp = NULL;
 	acl_ids->z_fuidp = NULL;
 }
 
 boolean_t
 zfs_acl_ids_overquota(zfsvfs_t *zfsvfs, zfs_acl_ids_t *acl_ids)
 {
 	return (zfs_fuid_overquota(zfsvfs, B_FALSE, acl_ids->z_fuid) ||
 	    zfs_fuid_overquota(zfsvfs, B_TRUE, acl_ids->z_fgid));
 }
 
 /*
  * Retrieve a file's ACL
  */
 int
 zfs_getacl(znode_t *zp, vsecattr_t *vsecp, boolean_t skipaclchk, cred_t *cr)
 {
 	zfs_acl_t	*aclp;
 	ulong_t		mask;
 	int		error;
 	int 		count = 0;
 	int		largeace = 0;
 
 	mask = vsecp->vsa_mask & (VSA_ACE | VSA_ACECNT |
 	    VSA_ACE_ACLFLAGS | VSA_ACE_ALLTYPES);
 
 	if (mask == 0)
 		return (SET_ERROR(ENOSYS));
 
 	if (error = zfs_zaccess(zp, ACE_READ_ACL, 0, skipaclchk, cr))
 		return (error);
 
 	mutex_enter(&zp->z_acl_lock);
 
-	error = zfs_acl_node_read(zp, B_FALSE, &aclp, B_FALSE);
+	ASSERT_VOP_LOCKED(ZTOV(zp), __func__);
+	error = zfs_acl_node_read(zp, &aclp, B_FALSE);
 	if (error != 0) {
 		mutex_exit(&zp->z_acl_lock);
 		return (error);
 	}
 
 	/*
 	 * Scan ACL to determine number of ACEs
 	 */
 	if ((zp->z_pflags & ZFS_ACL_OBJ_ACE) && !(mask & VSA_ACE_ALLTYPES)) {
 		void *zacep = NULL;
 		uint64_t who;
 		uint32_t access_mask;
 		uint16_t type, iflags;
 
 		while (zacep = zfs_acl_next_ace(aclp, zacep,
 		    &who, &access_mask, &iflags, &type)) {
 			switch (type) {
 			case ACE_ACCESS_ALLOWED_OBJECT_ACE_TYPE:
 			case ACE_ACCESS_DENIED_OBJECT_ACE_TYPE:
 			case ACE_SYSTEM_AUDIT_OBJECT_ACE_TYPE:
 			case ACE_SYSTEM_ALARM_OBJECT_ACE_TYPE:
 				largeace++;
 				continue;
 			default:
 				count++;
 			}
 		}
 		vsecp->vsa_aclcnt = count;
 	} else
 		count = (int)aclp->z_acl_count;
 
 	if (mask & VSA_ACECNT) {
 		vsecp->vsa_aclcnt = count;
 	}
 
 	if (mask & VSA_ACE) {
 		size_t aclsz;
 
 		aclsz = count * sizeof (ace_t) +
 		    sizeof (ace_object_t) * largeace;
 
 		vsecp->vsa_aclentp = kmem_alloc(aclsz, KM_SLEEP);
 		vsecp->vsa_aclentsz = aclsz;
 
 		if (aclp->z_version == ZFS_ACL_VERSION_FUID)
 			zfs_copy_fuid_2_ace(zp->z_zfsvfs, aclp, cr,
 			    vsecp->vsa_aclentp, !(mask & VSA_ACE_ALLTYPES));
 		else {
 			zfs_acl_node_t *aclnode;
 			void *start = vsecp->vsa_aclentp;
 
 			for (aclnode = list_head(&aclp->z_acl); aclnode;
 			    aclnode = list_next(&aclp->z_acl, aclnode)) {
 				bcopy(aclnode->z_acldata, start,
 				    aclnode->z_size);
 				start = (caddr_t)start + aclnode->z_size;
 			}
 			ASSERT((caddr_t)start - (caddr_t)vsecp->vsa_aclentp ==
 			    aclp->z_acl_bytes);
 		}
 	}
 	if (mask & VSA_ACE_ACLFLAGS) {
 		vsecp->vsa_aclflags = 0;
 		if (zp->z_pflags & ZFS_ACL_DEFAULTED)
 			vsecp->vsa_aclflags |= ACL_DEFAULTED;
 		if (zp->z_pflags & ZFS_ACL_PROTECTED)
 			vsecp->vsa_aclflags |= ACL_PROTECTED;
 		if (zp->z_pflags & ZFS_ACL_AUTO_INHERIT)
 			vsecp->vsa_aclflags |= ACL_AUTO_INHERIT;
 	}
 
 	mutex_exit(&zp->z_acl_lock);
 
 	return (0);
 }
 
 int
 zfs_vsec_2_aclp(zfsvfs_t *zfsvfs, vtype_t obj_type,
     vsecattr_t *vsecp, cred_t *cr, zfs_fuid_info_t **fuidp, zfs_acl_t **zaclp)
 {
 	zfs_acl_t *aclp;
 	zfs_acl_node_t *aclnode;
 	int aclcnt = vsecp->vsa_aclcnt;
 	int error;
 
 	if (vsecp->vsa_aclcnt > MAX_ACL_ENTRIES || vsecp->vsa_aclcnt <= 0)
 		return (SET_ERROR(EINVAL));
 
 	aclp = zfs_acl_alloc(zfs_acl_version(zfsvfs->z_version));
 
 	aclp->z_hints = 0;
 	aclnode = zfs_acl_node_alloc(aclcnt * sizeof (zfs_object_ace_t));
 	if (aclp->z_version == ZFS_ACL_VERSION_INITIAL) {
 		if ((error = zfs_copy_ace_2_oldace(obj_type, aclp,
 		    (ace_t *)vsecp->vsa_aclentp, aclnode->z_acldata,
 		    aclcnt, &aclnode->z_size)) != 0) {
 			zfs_acl_free(aclp);
 			zfs_acl_node_free(aclnode);
 			return (error);
 		}
 	} else {
 		if ((error = zfs_copy_ace_2_fuid(zfsvfs, obj_type, aclp,
 		    vsecp->vsa_aclentp, aclnode->z_acldata, aclcnt,
 		    &aclnode->z_size, fuidp, cr)) != 0) {
 			zfs_acl_free(aclp);
 			zfs_acl_node_free(aclnode);
 			return (error);
 		}
 	}
 	aclp->z_acl_bytes = aclnode->z_size;
 	aclnode->z_ace_count = aclcnt;
 	aclp->z_acl_count = aclcnt;
 	list_insert_head(&aclp->z_acl, aclnode);
 
 	/*
 	 * If flags are being set then add them to z_hints
 	 */
 	if (vsecp->vsa_mask & VSA_ACE_ACLFLAGS) {
 		if (vsecp->vsa_aclflags & ACL_PROTECTED)
 			aclp->z_hints |= ZFS_ACL_PROTECTED;
 		if (vsecp->vsa_aclflags & ACL_DEFAULTED)
 			aclp->z_hints |= ZFS_ACL_DEFAULTED;
 		if (vsecp->vsa_aclflags & ACL_AUTO_INHERIT)
 			aclp->z_hints |= ZFS_ACL_AUTO_INHERIT;
 	}
 
 	*zaclp = aclp;
 
 	return (0);
 }
 
 /*
  * Set a file's ACL
  */
 int
 zfs_setacl(znode_t *zp, vsecattr_t *vsecp, boolean_t skipaclchk, cred_t *cr)
 {
 	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
 	zilog_t		*zilog = zfsvfs->z_log;
 	ulong_t		mask = vsecp->vsa_mask & (VSA_ACE | VSA_ACECNT);
 	dmu_tx_t	*tx;
 	int		error;
 	zfs_acl_t	*aclp;
 	zfs_fuid_info_t	*fuidp = NULL;
 	boolean_t	fuid_dirtied;
 	uint64_t	acl_obj;
 
+	ASSERT_VOP_ELOCKED(ZTOV(zp), __func__);
 	if (mask == 0)
 		return (SET_ERROR(ENOSYS));
 
 	if (zp->z_pflags & ZFS_IMMUTABLE)
 		return (SET_ERROR(EPERM));
 
 	if (error = zfs_zaccess(zp, ACE_WRITE_ACL, 0, skipaclchk, cr))
 		return (error);
 
 	error = zfs_vsec_2_aclp(zfsvfs, ZTOV(zp)->v_type, vsecp, cr, &fuidp,
 	    &aclp);
 	if (error)
 		return (error);
 
 	/*
 	 * If ACL wide flags aren't being set then preserve any
 	 * existing flags.
 	 */
 	if (!(vsecp->vsa_mask & VSA_ACE_ACLFLAGS)) {
 		aclp->z_hints |=
 		    (zp->z_pflags & V4_ACL_WIDE_FLAGS);
 	}
 top:
 	mutex_enter(&zp->z_acl_lock);
-	mutex_enter(&zp->z_lock);
 
 	tx = dmu_tx_create(zfsvfs->z_os);
 
 	dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE);
 
 	fuid_dirtied = zfsvfs->z_fuid_dirty;
 	if (fuid_dirtied)
 		zfs_fuid_txhold(zfsvfs, tx);
 
 	/*
 	 * If old version and ACL won't fit in bonus and we aren't
 	 * upgrading then take out necessary DMU holds
 	 */
 
 	if ((acl_obj = zfs_external_acl(zp)) != 0) {
 		if (zfsvfs->z_version >= ZPL_VERSION_FUID &&
 		    zfs_znode_acl_version(zp) <= ZFS_ACL_VERSION_INITIAL) {
 			dmu_tx_hold_free(tx, acl_obj, 0,
 			    DMU_OBJECT_END);
 			dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0,
 			    aclp->z_acl_bytes);
 		} else {
 			dmu_tx_hold_write(tx, acl_obj, 0, aclp->z_acl_bytes);
 		}
 	} else if (!zp->z_is_sa && aclp->z_acl_bytes > ZFS_ACE_SPACE) {
 		dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, aclp->z_acl_bytes);
 	}
 
 	zfs_sa_upgrade_txholds(tx, zp);
 	error = dmu_tx_assign(tx, TXG_NOWAIT);
 	if (error) {
-		mutex_exit(&zp->z_lock);
 		mutex_exit(&zp->z_acl_lock);
 
 		if (error == ERESTART) {
 			dmu_tx_wait(tx);
 			dmu_tx_abort(tx);
 			goto top;
 		}
 		dmu_tx_abort(tx);
 		zfs_acl_free(aclp);
 		return (error);
 	}
 
 	error = zfs_aclset_common(zp, aclp, cr, tx);
 	ASSERT(error == 0);
 	ASSERT(zp->z_acl_cached == NULL);
 	zp->z_acl_cached = aclp;
 
 	if (fuid_dirtied)
 		zfs_fuid_sync(zfsvfs, tx);
 
 	zfs_log_acl(zilog, tx, zp, vsecp, fuidp);
 
 	if (fuidp)
 		zfs_fuid_info_free(fuidp);
 	dmu_tx_commit(tx);
-	mutex_exit(&zp->z_lock);
 	mutex_exit(&zp->z_acl_lock);
 
 	return (error);
 }
 
 /*
  * Check accesses of interest (AoI) against attributes of the dataset
  * such as read-only.  Returns zero if no AoI conflict with dataset
  * attributes, otherwise an appropriate errno is returned.
  */
 static int
 zfs_zaccess_dataset_check(znode_t *zp, uint32_t v4_mode)
 {
 	if ((v4_mode & WRITE_MASK) &&
 	    (zp->z_zfsvfs->z_vfs->vfs_flag & VFS_RDONLY) &&
 	    (!IS_DEVVP(ZTOV(zp)) ||
 	    (IS_DEVVP(ZTOV(zp)) && (v4_mode & WRITE_MASK_ATTRS)))) {
 		return (SET_ERROR(EROFS));
 	}
 
 	/*
 	 * Only check for READONLY on non-directories.
 	 */
 	if ((v4_mode & WRITE_MASK_DATA) &&
 	    (((ZTOV(zp)->v_type != VDIR) &&
 	    (zp->z_pflags & (ZFS_READONLY | ZFS_IMMUTABLE))) ||
 	    (ZTOV(zp)->v_type == VDIR &&
 	    (zp->z_pflags & ZFS_IMMUTABLE)))) {
 		return (SET_ERROR(EPERM));
 	}
 
 #ifdef illumos
 	if ((v4_mode & (ACE_DELETE | ACE_DELETE_CHILD)) &&
 	    (zp->z_pflags & ZFS_NOUNLINK)) {
 		return (SET_ERROR(EPERM));
 	}
 #else
 	/*
 	 * In FreeBSD we allow to modify directory's content is ZFS_NOUNLINK
 	 * (sunlnk) is set. We just don't allow directory removal, which is
 	 * handled in zfs_zaccess_delete().
 	 */
 	if ((v4_mode & ACE_DELETE) &&
 	    (zp->z_pflags & ZFS_NOUNLINK)) {
 		return (EPERM);
 	}
 #endif
 
 	if (((v4_mode & (ACE_READ_DATA|ACE_EXECUTE)) &&
 	    (zp->z_pflags & ZFS_AV_QUARANTINED))) {
 		return (SET_ERROR(EACCES));
 	}
 
 	return (0);
 }
 
 /*
  * The primary usage of this function is to loop through all of the
  * ACEs in the znode, determining what accesses of interest (AoI) to
  * the caller are allowed or denied.  The AoI are expressed as bits in
  * the working_mode parameter.  As each ACE is processed, bits covered
  * by that ACE are removed from the working_mode.  This removal
  * facilitates two things.  The first is that when the working mode is
  * empty (= 0), we know we've looked at all the AoI. The second is
  * that the ACE interpretation rules don't allow a later ACE to undo
  * something granted or denied by an earlier ACE.  Removing the
  * discovered access or denial enforces this rule.  At the end of
  * processing the ACEs, all AoI that were found to be denied are
  * placed into the working_mode, giving the caller a mask of denied
  * accesses.  Returns:
  *	0		if all AoI granted
  *	EACCESS 	if the denied mask is non-zero
  *	other error	if abnormal failure (e.g., IO error)
  *
  * A secondary usage of the function is to determine if any of the
  * AoI are granted.  If an ACE grants any access in
  * the working_mode, we immediately short circuit out of the function.
  * This mode is chosen by setting anyaccess to B_TRUE.  The
  * working_mode is not a denied access mask upon exit if the function
  * is used in this manner.
  */
 static int
 zfs_zaccess_aces_check(znode_t *zp, uint32_t *working_mode,
     boolean_t anyaccess, cred_t *cr)
 {
 	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
 	zfs_acl_t	*aclp;
 	int		error;
 	uid_t		uid = crgetuid(cr);
 	uint64_t 	who;
 	uint16_t	type, iflags;
 	uint16_t	entry_type;
 	uint32_t	access_mask;
 	uint32_t	deny_mask = 0;
 	zfs_ace_hdr_t	*acep = NULL;
 	boolean_t	checkit;
 	uid_t		gowner;
 	uid_t		fowner;
 
 	zfs_fuid_map_ids(zp, cr, &fowner, &gowner);
 
 	mutex_enter(&zp->z_acl_lock);
 
-	error = zfs_acl_node_read(zp, B_FALSE, &aclp, B_FALSE);
+	ASSERT_VOP_LOCKED(ZTOV(zp), __func__);
+	error = zfs_acl_node_read(zp, &aclp, B_FALSE);
 	if (error != 0) {
 		mutex_exit(&zp->z_acl_lock);
 		return (error);
 	}
 
 	ASSERT(zp->z_acl_cached);
 
 	while (acep = zfs_acl_next_ace(aclp, acep, &who, &access_mask,
 	    &iflags, &type)) {
 		uint32_t mask_matched;
 
 		if (!zfs_acl_valid_ace_type(type, iflags))
 			continue;
 
 		if (ZTOV(zp)->v_type == VDIR && (iflags & ACE_INHERIT_ONLY_ACE))
 			continue;
 
 		/* Skip ACE if it does not affect any AoI */
 		mask_matched = (access_mask & *working_mode);
 		if (!mask_matched)
 			continue;
 
 		entry_type = (iflags & ACE_TYPE_FLAGS);
 
 		checkit = B_FALSE;
 
 		switch (entry_type) {
 		case ACE_OWNER:
 			if (uid == fowner)
 				checkit = B_TRUE;
 			break;
 		case OWNING_GROUP:
 			who = gowner;
 			/*FALLTHROUGH*/
 		case ACE_IDENTIFIER_GROUP:
 			checkit = zfs_groupmember(zfsvfs, who, cr);
 			break;
 		case ACE_EVERYONE:
 			checkit = B_TRUE;
 			break;
 
 		/* USER Entry */
 		default:
 			if (entry_type == 0) {
 				uid_t newid;
 
 				newid = zfs_fuid_map_id(zfsvfs, who, cr,
 				    ZFS_ACE_USER);
 				if (newid != IDMAP_WK_CREATOR_OWNER_UID &&
 				    uid == newid)
 					checkit = B_TRUE;
 				break;
 			} else {
 				mutex_exit(&zp->z_acl_lock);
 				return (SET_ERROR(EIO));
 			}
 		}
 
 		if (checkit) {
 			if (type == DENY) {
 				DTRACE_PROBE3(zfs__ace__denies,
 				    znode_t *, zp,
 				    zfs_ace_hdr_t *, acep,
 				    uint32_t, mask_matched);
 				deny_mask |= mask_matched;
 			} else {
 				DTRACE_PROBE3(zfs__ace__allows,
 				    znode_t *, zp,
 				    zfs_ace_hdr_t *, acep,
 				    uint32_t, mask_matched);
 				if (anyaccess) {
 					mutex_exit(&zp->z_acl_lock);
 					return (0);
 				}
 			}
 			*working_mode &= ~mask_matched;
 		}
 
 		/* Are we done? */
 		if (*working_mode == 0)
 			break;
 	}
 
 	mutex_exit(&zp->z_acl_lock);
 
 	/* Put the found 'denies' back on the working mode */
 	if (deny_mask) {
 		*working_mode |= deny_mask;
 		return (SET_ERROR(EACCES));
 	} else if (*working_mode) {
 		return (-1);
 	}
 
 	return (0);
 }
 
 /*
  * Return true if any access whatsoever granted, we don't actually
  * care what access is granted.
  */
 boolean_t
 zfs_has_access(znode_t *zp, cred_t *cr)
 {
 	uint32_t have = ACE_ALL_PERMS;
 
 	if (zfs_zaccess_aces_check(zp, &have, B_TRUE, cr) != 0) {
 		uid_t owner;
 
 		owner = zfs_fuid_map_id(zp->z_zfsvfs, zp->z_uid, cr, ZFS_OWNER);
 		return (secpolicy_vnode_any_access(cr, ZTOV(zp), owner) == 0);
 	}
 	return (B_TRUE);
 }
 
 static int
 zfs_zaccess_common(znode_t *zp, uint32_t v4_mode, uint32_t *working_mode,
     boolean_t *check_privs, boolean_t skipaclchk, cred_t *cr)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	int err;
 
 	*working_mode = v4_mode;
 	*check_privs = B_TRUE;
 
 	/*
 	 * Short circuit empty requests
 	 */
 	if (v4_mode == 0 || zfsvfs->z_replay) {
 		*working_mode = 0;
 		return (0);
 	}
 
 	if ((err = zfs_zaccess_dataset_check(zp, v4_mode)) != 0) {
 		*check_privs = B_FALSE;
 		return (err);
 	}
 
 	/*
 	 * The caller requested that the ACL check be skipped.  This
 	 * would only happen if the caller checked VOP_ACCESS() with a
 	 * 32 bit ACE mask and already had the appropriate permissions.
 	 */
 	if (skipaclchk) {
 		*working_mode = 0;
 		return (0);
 	}
 
 	return (zfs_zaccess_aces_check(zp, working_mode, B_FALSE, cr));
 }
 
 static int
 zfs_zaccess_append(znode_t *zp, uint32_t *working_mode, boolean_t *check_privs,
     cred_t *cr)
 {
 	if (*working_mode != ACE_WRITE_DATA)
 		return (SET_ERROR(EACCES));
 
 	return (zfs_zaccess_common(zp, ACE_APPEND_DATA, working_mode,
 	    check_privs, B_FALSE, cr));
 }
 
 int
 zfs_fastaccesschk_execute(znode_t *zdp, cred_t *cr)
 {
 	boolean_t owner = B_FALSE;
 	boolean_t groupmbr = B_FALSE;
 	boolean_t is_attr;
 	uid_t uid = crgetuid(cr);
 	int error;
 
 	if (zdp->z_pflags & ZFS_AV_QUARANTINED)
 		return (SET_ERROR(EACCES));
 
 	is_attr = ((zdp->z_pflags & ZFS_XATTR) &&
 	    (ZTOV(zdp)->v_type == VDIR));
 	if (is_attr)
 		goto slow;
 
 
 	mutex_enter(&zdp->z_acl_lock);
 
 	if (zdp->z_pflags & ZFS_NO_EXECS_DENIED) {
 		mutex_exit(&zdp->z_acl_lock);
 		return (0);
 	}
 
 	if (FUID_INDEX(zdp->z_uid) != 0 || FUID_INDEX(zdp->z_gid) != 0) {
 		mutex_exit(&zdp->z_acl_lock);
 		goto slow;
 	}
 
 	if (uid == zdp->z_uid) {
 		owner = B_TRUE;
 		if (zdp->z_mode & S_IXUSR) {
 			mutex_exit(&zdp->z_acl_lock);
 			return (0);
 		} else {
 			mutex_exit(&zdp->z_acl_lock);
 			goto slow;
 		}
 	}
 	if (groupmember(zdp->z_gid, cr)) {
 		groupmbr = B_TRUE;
 		if (zdp->z_mode & S_IXGRP) {
 			mutex_exit(&zdp->z_acl_lock);
 			return (0);
 		} else {
 			mutex_exit(&zdp->z_acl_lock);
 			goto slow;
 		}
 	}
 	if (!owner && !groupmbr) {
 		if (zdp->z_mode & S_IXOTH) {
 			mutex_exit(&zdp->z_acl_lock);
 			return (0);
 		}
 	}
 
 	mutex_exit(&zdp->z_acl_lock);
 
 slow:
 	DTRACE_PROBE(zfs__fastpath__execute__access__miss);
 	ZFS_ENTER(zdp->z_zfsvfs);
 	error = zfs_zaccess(zdp, ACE_EXECUTE, 0, B_FALSE, cr);
 	ZFS_EXIT(zdp->z_zfsvfs);
 	return (error);
 }
 
 /*
  * Determine whether Access should be granted/denied.
  *
  * The least priv subsytem is always consulted as a basic privilege
  * can define any form of access.
  */
 int
 zfs_zaccess(znode_t *zp, int mode, int flags, boolean_t skipaclchk, cred_t *cr)
 {
 	uint32_t	working_mode;
 	int		error;
 	int		is_attr;
 	boolean_t 	check_privs;
 	znode_t		*xzp;
 	znode_t 	*check_zp = zp;
 	mode_t		needed_bits;
 	uid_t		owner;
 
 	is_attr = ((zp->z_pflags & ZFS_XATTR) && (ZTOV(zp)->v_type == VDIR));
 
 #ifdef __FreeBSD_kernel__
 	/*
 	 * In FreeBSD, we don't care about permissions of individual ADS.
 	 * Note that not checking them is not just an optimization - without
 	 * this shortcut, EA operations may bogusly fail with EACCES.
 	 */
 	if (zp->z_pflags & ZFS_XATTR)
 		return (0);
 #else
 	/*
 	 * If attribute then validate against base file
 	 */
 	if (is_attr) {
 		uint64_t	parent;
 
 		if ((error = sa_lookup(zp->z_sa_hdl,
 		    SA_ZPL_PARENT(zp->z_zfsvfs), &parent,
 		    sizeof (parent))) != 0)
 			return (error);
 
 		if ((error = zfs_zget(zp->z_zfsvfs,
 		    parent, &xzp)) != 0)	{
 			return (error);
 		}
 
 		check_zp = xzp;
 
 		/*
 		 * fixup mode to map to xattr perms
 		 */
 
 		if (mode & (ACE_WRITE_DATA|ACE_APPEND_DATA)) {
 			mode &= ~(ACE_WRITE_DATA|ACE_APPEND_DATA);
 			mode |= ACE_WRITE_NAMED_ATTRS;
 		}
 
 		if (mode & (ACE_READ_DATA|ACE_EXECUTE)) {
 			mode &= ~(ACE_READ_DATA|ACE_EXECUTE);
 			mode |= ACE_READ_NAMED_ATTRS;
 		}
 	}
 #endif
 
 	owner = zfs_fuid_map_id(zp->z_zfsvfs, zp->z_uid, cr, ZFS_OWNER);
 	/*
 	 * Map the bits required to the standard vnode flags VREAD|VWRITE|VEXEC
 	 * in needed_bits.  Map the bits mapped by working_mode (currently
 	 * missing) in missing_bits.
 	 * Call secpolicy_vnode_access2() with (needed_bits & ~checkmode),
 	 * needed_bits.
 	 */
 	needed_bits = 0;
 
 	working_mode = mode;
 	if ((working_mode & (ACE_READ_ACL|ACE_READ_ATTRIBUTES)) &&
 	    owner == crgetuid(cr))
 		working_mode &= ~(ACE_READ_ACL|ACE_READ_ATTRIBUTES);
 
 	if (working_mode & (ACE_READ_DATA|ACE_READ_NAMED_ATTRS|
 	    ACE_READ_ACL|ACE_READ_ATTRIBUTES|ACE_SYNCHRONIZE))
 		needed_bits |= VREAD;
 	if (working_mode & (ACE_WRITE_DATA|ACE_WRITE_NAMED_ATTRS|
 	    ACE_APPEND_DATA|ACE_WRITE_ATTRIBUTES|ACE_SYNCHRONIZE))
 		needed_bits |= VWRITE;
 	if (working_mode & ACE_EXECUTE)
 		needed_bits |= VEXEC;
 
 	if ((error = zfs_zaccess_common(check_zp, mode, &working_mode,
 	    &check_privs, skipaclchk, cr)) == 0) {
 		if (is_attr)
 			VN_RELE(ZTOV(xzp));
 		return (secpolicy_vnode_access2(cr, ZTOV(zp), owner,
 		    needed_bits, needed_bits));
 	}
 
 	if (error && !check_privs) {
 		if (is_attr)
 			VN_RELE(ZTOV(xzp));
 		return (error);
 	}
 
 	if (error && (flags & V_APPEND)) {
 		error = zfs_zaccess_append(zp, &working_mode, &check_privs, cr);
 	}
 
 	if (error && check_privs) {
 		mode_t		checkmode = 0;
 
 		/*
 		 * First check for implicit owner permission on
 		 * read_acl/read_attributes
 		 */
 
 		error = 0;
 		ASSERT(working_mode != 0);
 
 		if ((working_mode & (ACE_READ_ACL|ACE_READ_ATTRIBUTES) &&
 		    owner == crgetuid(cr)))
 			working_mode &= ~(ACE_READ_ACL|ACE_READ_ATTRIBUTES);
 
 		if (working_mode & (ACE_READ_DATA|ACE_READ_NAMED_ATTRS|
 		    ACE_READ_ACL|ACE_READ_ATTRIBUTES|ACE_SYNCHRONIZE))
 			checkmode |= VREAD;
 		if (working_mode & (ACE_WRITE_DATA|ACE_WRITE_NAMED_ATTRS|
 		    ACE_APPEND_DATA|ACE_WRITE_ATTRIBUTES|ACE_SYNCHRONIZE))
 			checkmode |= VWRITE;
 		if (working_mode & ACE_EXECUTE)
 			checkmode |= VEXEC;
 
 		error = secpolicy_vnode_access2(cr, ZTOV(check_zp), owner,
 		    needed_bits & ~checkmode, needed_bits);
 
 		if (error == 0 && (working_mode & ACE_WRITE_OWNER))
 			error = secpolicy_vnode_chown(ZTOV(check_zp), cr, owner);
 		if (error == 0 && (working_mode & ACE_WRITE_ACL))
 			error = secpolicy_vnode_setdac(ZTOV(check_zp), cr, owner);
 
 		if (error == 0 && (working_mode &
 		    (ACE_DELETE|ACE_DELETE_CHILD)))
 			error = secpolicy_vnode_remove(ZTOV(check_zp), cr);
 
 		if (error == 0 && (working_mode & ACE_SYNCHRONIZE)) {
 			error = secpolicy_vnode_chown(ZTOV(check_zp), cr, owner);
 		}
 		if (error == 0) {
 			/*
 			 * See if any bits other than those already checked
 			 * for are still present.  If so then return EACCES
 			 */
 			if (working_mode & ~(ZFS_CHECKED_MASKS)) {
 				error = SET_ERROR(EACCES);
 			}
 		}
 	} else if (error == 0) {
 		error = secpolicy_vnode_access2(cr, ZTOV(zp), owner,
 		    needed_bits, needed_bits);
 	}
 
 
 	if (is_attr)
 		VN_RELE(ZTOV(xzp));
 
 	return (error);
 }
 
 /*
  * Translate traditional unix VREAD/VWRITE/VEXEC mode into
  * native ACL format and call zfs_zaccess()
  */
 int
 zfs_zaccess_rwx(znode_t *zp, mode_t mode, int flags, cred_t *cr)
 {
 	return (zfs_zaccess(zp, zfs_unix_to_v4(mode >> 6), flags, B_FALSE, cr));
 }
 
 /*
  * Access function for secpolicy_vnode_setattr
  */
 int
 zfs_zaccess_unix(znode_t *zp, mode_t mode, cred_t *cr)
 {
 	int v4_mode = zfs_unix_to_v4(mode >> 6);
 
 	return (zfs_zaccess(zp, v4_mode, 0, B_FALSE, cr));
 }
 
 static int
 zfs_delete_final_check(znode_t *zp, znode_t *dzp,
     mode_t available_perms, cred_t *cr)
 {
 	int error;
 	uid_t downer;
 
 	downer = zfs_fuid_map_id(dzp->z_zfsvfs, dzp->z_uid, cr, ZFS_OWNER);
 
 	error = secpolicy_vnode_access2(cr, ZTOV(dzp),
 	    downer, available_perms, VWRITE|VEXEC);
 
 	if (error == 0)
 		error = zfs_sticky_remove_access(dzp, zp, cr);
 
 	return (error);
 }
 
 /*
  * Determine whether Access should be granted/deny, without
  * consulting least priv subsystem.
  *
  * The following chart is the recommended NFSv4 enforcement for
  * ability to delete an object.
  *
  *      -------------------------------------------------------
  *      |   Parent Dir  |           Target Object Permissions |
  *      |  permissions  |                                     |
  *      -------------------------------------------------------
  *      |               | ACL Allows | ACL Denies| Delete     |
  *      |               |  Delete    |  Delete   | unspecified|
  *      -------------------------------------------------------
  *      |  ACL Allows   | Permit     | Permit    | Permit     |
  *      |  DELETE_CHILD |                                     |
  *      -------------------------------------------------------
  *      |  ACL Denies   | Permit     | Deny      | Deny       |
  *      |  DELETE_CHILD |            |           |            |
  *      -------------------------------------------------------
  *      | ACL specifies |            |           |            |
  *      | only allow    | Permit     | Permit    | Permit     |
  *      | write and     |            |           |            |
  *      | execute       |            |           |            |
  *      -------------------------------------------------------
  *      | ACL denies    |            |           |            |
  *      | write and     | Permit     | Deny      | Deny       |
  *      | execute       |            |           |            |
  *      -------------------------------------------------------
  *         ^
  *         |
  *         No search privilege, can't even look up file?
  *
  */
 int
 zfs_zaccess_delete(znode_t *dzp, znode_t *zp, cred_t *cr)
 {
 	uint32_t dzp_working_mode = 0;
 	uint32_t zp_working_mode = 0;
 	int dzp_error, zp_error;
 	mode_t available_perms;
 	boolean_t dzpcheck_privs = B_TRUE;
 	boolean_t zpcheck_privs = B_TRUE;
 
 	/*
 	 * We want specific DELETE permissions to
 	 * take precedence over WRITE/EXECUTE.  We don't
 	 * want an ACL such as this to mess us up.
 	 * user:joe:write_data:deny,user:joe:delete:allow
 	 *
 	 * However, deny permissions may ultimately be overridden
 	 * by secpolicy_vnode_access().
 	 *
 	 * We will ask for all of the necessary permissions and then
 	 * look at the working modes from the directory and target object
 	 * to determine what was found.
 	 */
 
 	if (zp->z_pflags & (ZFS_IMMUTABLE | ZFS_NOUNLINK))
 		return (SET_ERROR(EPERM));
 
 	/*
 	 * First row
 	 * If the directory permissions allow the delete, we are done.
 	 */
 	if ((dzp_error = zfs_zaccess_common(dzp, ACE_DELETE_CHILD,
 	    &dzp_working_mode, &dzpcheck_privs, B_FALSE, cr)) == 0)
 		return (0);
 
 	/*
 	 * If target object has delete permission then we are done
 	 */
 	if ((zp_error = zfs_zaccess_common(zp, ACE_DELETE, &zp_working_mode,
 	    &zpcheck_privs, B_FALSE, cr)) == 0)
 		return (0);
 
 	ASSERT(dzp_error && zp_error);
 
 	if (!dzpcheck_privs)
 		return (dzp_error);
 	if (!zpcheck_privs)
 		return (zp_error);
 
 	/*
 	 * Second row
 	 *
 	 * If directory returns EACCES then delete_child was denied
 	 * due to deny delete_child.  In this case send the request through
 	 * secpolicy_vnode_remove().  We don't use zfs_delete_final_check()
 	 * since that *could* allow the delete based on write/execute permission
 	 * and we want delete permissions to override write/execute.
 	 */
 
 	if (dzp_error == EACCES)
 		return (secpolicy_vnode_remove(ZTOV(dzp), cr));	/* XXXPJD: s/dzp/zp/ ? */
 
 	/*
 	 * Third Row
 	 * only need to see if we have write/execute on directory.
 	 */
 
 	dzp_error = zfs_zaccess_common(dzp, ACE_EXECUTE|ACE_WRITE_DATA,
 	    &dzp_working_mode, &dzpcheck_privs, B_FALSE, cr);
 
 	if (dzp_error != 0 && !dzpcheck_privs)
 		return (dzp_error);
 
 	/*
 	 * Fourth row
 	 */
 
 	available_perms = (dzp_working_mode & ACE_WRITE_DATA) ? 0 : VWRITE;
 	available_perms |= (dzp_working_mode & ACE_EXECUTE) ? 0 : VEXEC;
 
 	return (zfs_delete_final_check(zp, dzp, available_perms, cr));
 
 }
 
 int
 zfs_zaccess_rename(znode_t *sdzp, znode_t *szp, znode_t *tdzp,
     znode_t *tzp, cred_t *cr)
 {
 	int add_perm;
 	int error;
 
 	if (szp->z_pflags & ZFS_AV_QUARANTINED)
 		return (SET_ERROR(EACCES));
 
 	add_perm = (ZTOV(szp)->v_type == VDIR) ?
 	    ACE_ADD_SUBDIRECTORY : ACE_ADD_FILE;
 
 	/*
 	 * Rename permissions are combination of delete permission +
 	 * add file/subdir permission.
 	 *
 	 * BSD operating systems also require write permission
 	 * on the directory being moved from one parent directory
 	 * to another.
 	 */
 	if (ZTOV(szp)->v_type == VDIR && ZTOV(sdzp) != ZTOV(tdzp)) {
 		if (error = zfs_zaccess(szp, ACE_WRITE_DATA, 0, B_FALSE, cr))
 			return (error);
 	}
 
 	/*
 	 * first make sure we do the delete portion.
 	 *
 	 * If that succeeds then check for add_file/add_subdir permissions
 	 */
 
 	if (error = zfs_zaccess_delete(sdzp, szp, cr))
 		return (error);
 
 	/*
 	 * If we have a tzp, see if we can delete it?
 	 */
 	if (tzp) {
 		if (error = zfs_zaccess_delete(tdzp, tzp, cr))
 			return (error);
 	}
 
 	/*
 	 * Now check for add permissions
 	 */
 	error = zfs_zaccess(tdzp, add_perm, 0, B_FALSE, cr);
 
 	return (error);
 }
Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_dir.c	(revision 303775)
@@ -1,1115 +1,891 @@
 /*
  * CDDL HEADER START
  *
  * The contents of this file are subject to the terms of the
  * Common Development and Distribution License (the "License").
  * You may not use this file except in compliance with the License.
  *
  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  * or http://www.opensolaris.org/os/licensing.
  * See the License for the specific language governing permissions
  * and limitations under the License.
  *
  * When distributing Covered Code, include this CDDL HEADER in each
  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  * If applicable, add the following below this CDDL HEADER, with the
  * fields enclosed by brackets "[]" replaced with your own identifying
  * information: Portions Copyright [yyyy] [name of copyright owner]
  *
  * CDDL HEADER END
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  * Copyright (c) 2013, 2015 by Delphix. All rights reserved.
  */
 
 #include <sys/types.h>
 #include <sys/param.h>
 #include <sys/time.h>
 #include <sys/systm.h>
 #include <sys/sysmacros.h>
 #include <sys/resource.h>
 #include <sys/vfs.h>
 #include <sys/vnode.h>
 #include <sys/file.h>
 #include <sys/kmem.h>
 #include <sys/uio.h>
 #include <sys/cmn_err.h>
 #include <sys/errno.h>
 #include <sys/stat.h>
 #include <sys/unistd.h>
 #include <sys/sunddi.h>
 #include <sys/random.h>
 #include <sys/policy.h>
 #include <sys/kcondvar.h>
 #include <sys/callb.h>
 #include <sys/smp.h>
 #include <sys/zfs_dir.h>
 #include <sys/zfs_acl.h>
 #include <sys/fs/zfs.h>
 #include <sys/zap.h>
 #include <sys/dmu.h>
 #include <sys/atomic.h>
 #include <sys/zfs_ctldir.h>
 #include <sys/zfs_fuid.h>
 #include <sys/sa.h>
 #include <sys/zfs_sa.h>
 #include <sys/dnlc.h>
 #include <sys/extdirent.h>
 
 /*
- * zfs_match_find() is used by zfs_dirent_lock() to peform zap lookups
+ * zfs_match_find() is used by zfs_dirent_lookup() to peform zap lookups
  * of names after deciding which is the appropriate lookup interface.
  */
 static int
-zfs_match_find(zfsvfs_t *zfsvfs, znode_t *dzp, char *name, boolean_t exact,
-    boolean_t update, int *deflags, pathname_t *rpnp, uint64_t *zoid)
+zfs_match_find(zfsvfs_t *zfsvfs, znode_t *dzp, const char *name,
+    boolean_t exact, uint64_t *zoid)
 {
 	int error;
 
 	if (zfsvfs->z_norm) {
-		matchtype_t mt = MT_FIRST;
-		boolean_t conflict = B_FALSE;
-		size_t bufsz = 0;
-		char *buf = NULL;
+		matchtype_t mt = exact? MT_EXACT : MT_FIRST;
 
-		if (rpnp) {
-			buf = rpnp->pn_buf;
-			bufsz = rpnp->pn_bufsize;
-		}
-		if (exact)
-			mt = MT_EXACT;
 		/*
 		 * In the non-mixed case we only expect there would ever
 		 * be one match, but we need to use the normalizing lookup.
 		 */
 		error = zap_lookup_norm(zfsvfs->z_os, dzp->z_id, name, 8, 1,
-		    zoid, mt, buf, bufsz, &conflict);
-		if (!error && deflags)
-			*deflags = conflict ? ED_CASE_CONFLICT : 0;
+		    zoid, mt, NULL, 0, NULL);
 	} else {
 		error = zap_lookup(zfsvfs->z_os, dzp->z_id, name, 8, 1, zoid);
 	}
 	*zoid = ZFS_DIRENT_OBJ(*zoid);
 
-	if (error == ENOENT && update)
-		dnlc_update(ZTOV(dzp), name, DNLC_NO_VNODE);
-
 	return (error);
 }
 
 /*
- * Lock a directory entry.  A dirlock on <dzp, name> protects that name
- * in dzp's directory zap object.  As long as you hold a dirlock, you can
- * assume two things: (1) dzp cannot be reaped, and (2) no other thread
- * can change the zap entry for (i.e. link or unlink) this name.
+ * Look up a directory entry under a locked vnode.
+ * dvp being locked gives us a guarantee that there are no concurrent
+ * modification of the directory and, thus, if a node can be found in
+ * the directory, then it must not be unlinked.
  *
  * Input arguments:
  *	dzp	- znode for directory
  *	name	- name of entry to lock
  *	flag	- ZNEW: if the entry already exists, fail with EEXIST.
  *		  ZEXISTS: if the entry does not exist, fail with ENOENT.
- *		  ZSHARED: allow concurrent access with other ZSHARED callers.
  *		  ZXATTR: we want dzp's xattr directory
- *		  ZCILOOK: On a mixed sensitivity file system,
- *			   this lookup should be case-insensitive.
- *		  ZCIEXACT: On a purely case-insensitive file system,
- *			    this lookup should be case-sensitive.
- *		  ZRENAMING: we are locking for renaming, force narrow locks
- *		  ZHAVELOCK: Don't grab the z_name_lock for this call. The
- *			     current thread already holds it.
  *
  * Output arguments:
  *	zpp	- pointer to the znode for the entry (NULL if there isn't one)
- *	dlpp	- pointer to the dirlock for this entry (NULL on error)
- *      direntflags - (case-insensitive lookup only)
- *		flags if multiple case-sensitive matches exist in directory
- *      realpnp     - (case-insensitive lookup only)
- *		actual name matched within the directory
  *
  * Return value: 0 on success or errno on failure.
  *
  * NOTE: Always checks for, and rejects, '.' and '..'.
- * NOTE: For case-insensitive file systems we take wide locks (see below),
- *	 but return znode pointers to a single match.
  */
 int
-zfs_dirent_lock(zfs_dirlock_t **dlpp, znode_t *dzp, char *name, znode_t **zpp,
-    int flag, int *direntflags, pathname_t *realpnp)
+zfs_dirent_lookup(znode_t *dzp, const char *name, znode_t **zpp, int flag)
 {
 	zfsvfs_t	*zfsvfs = dzp->z_zfsvfs;
-	zfs_dirlock_t	*dl;
-	boolean_t	update;
 	boolean_t	exact;
 	uint64_t	zoid;
 	vnode_t		*vp = NULL;
 	int		error = 0;
-	int		cmpflags;
 
+	ASSERT_VOP_LOCKED(ZTOV(dzp), __func__);
+
 	*zpp = NULL;
-	*dlpp = NULL;
 
 	/*
 	 * Verify that we are not trying to lock '.', '..', or '.zfs'
 	 */
 	if (name[0] == '.' &&
 	    (name[1] == '\0' || (name[1] == '.' && name[2] == '\0')) ||
 	    zfs_has_ctldir(dzp) && strcmp(name, ZFS_CTLDIR_NAME) == 0)
 		return (SET_ERROR(EEXIST));
 
 	/*
 	 * Case sensitivity and normalization preferences are set when
 	 * the file system is created.  These are stored in the
 	 * zfsvfs->z_case and zfsvfs->z_norm fields.  These choices
-	 * affect what vnodes can be cached in the DNLC, how we
-	 * perform zap lookups, and the "width" of our dirlocks.
+	 * affect how we perform zap lookups.
 	 *
-	 * A normal dirlock locks a single name.  Note that with
-	 * normalization a name can be composed multiple ways, but
-	 * when normalized, these names all compare equal.  A wide
-	 * dirlock locks multiple names.  We need these when the file
-	 * system is supporting mixed-mode access.  It is sometimes
-	 * necessary to lock all case permutations of file name at
-	 * once so that simultaneous case-insensitive/case-sensitive
-	 * behaves as rationally as possible.
-	 */
-
-	/*
 	 * Decide if exact matches should be requested when performing
 	 * a zap lookup on file systems supporting case-insensitive
 	 * access.
-	 */
-	exact =
-	    ((zfsvfs->z_case == ZFS_CASE_INSENSITIVE) && (flag & ZCIEXACT)) ||
-	    ((zfsvfs->z_case == ZFS_CASE_MIXED) && !(flag & ZCILOOK));
-
-	/*
-	 * Only look in or update the DNLC if we are looking for the
-	 * name on a file system that does not require normalization
-	 * or case folding.  We can also look there if we happen to be
-	 * on a non-normalizing, mixed sensitivity file system IF we
-	 * are looking for the exact name.
 	 *
-	 * Maybe can add TO-UPPERed version of name to dnlc in ci-only
-	 * case for performance improvement?
+	 * NB: we do not need to worry about this flag for ZFS_CASE_SENSITIVE
+	 * because in that case MT_EXACT and MT_FIRST should produce exactly
+	 * the same result.
 	 */
-	update = !zfsvfs->z_norm ||
-	    ((zfsvfs->z_case == ZFS_CASE_MIXED) &&
-	    !(zfsvfs->z_norm & ~U8_TEXTPREP_TOUPPER) && !(flag & ZCILOOK));
+	exact = zfsvfs->z_case == ZFS_CASE_MIXED;
 
-	/*
-	 * ZRENAMING indicates we are in a situation where we should
-	 * take narrow locks regardless of the file system's
-	 * preferences for normalizing and case folding.  This will
-	 * prevent us deadlocking trying to grab the same wide lock
-	 * twice if the two names happen to be case-insensitive
-	 * matches.
-	 */
-	if (flag & ZRENAMING)
-		cmpflags = 0;
-	else
-		cmpflags = zfsvfs->z_norm;
-
-	/*
-	 * Wait until there are no locks on this name.
-	 *
-	 * Don't grab the the lock if it is already held. However, cannot
-	 * have both ZSHARED and ZHAVELOCK together.
-	 */
-	ASSERT(!(flag & ZSHARED) || !(flag & ZHAVELOCK));
-	if (!(flag & ZHAVELOCK))
-		rw_enter(&dzp->z_name_lock, RW_READER);
-
-	mutex_enter(&dzp->z_lock);
-	for (;;) {
-		if (dzp->z_unlinked && !(flag & ZXATTR)) {
-			mutex_exit(&dzp->z_lock);
-			if (!(flag & ZHAVELOCK))
-				rw_exit(&dzp->z_name_lock);
-			return (SET_ERROR(ENOENT));
-		}
-		for (dl = dzp->z_dirlocks; dl != NULL; dl = dl->dl_next) {
-			if ((u8_strcmp(name, dl->dl_name, 0, cmpflags,
-			    U8_UNICODE_LATEST, &error) == 0) || error != 0)
-				break;
-		}
-		if (error != 0) {
-			mutex_exit(&dzp->z_lock);
-			if (!(flag & ZHAVELOCK))
-				rw_exit(&dzp->z_name_lock);
-			return (SET_ERROR(ENOENT));
-		}
-		if (dl == NULL)	{
-			size_t namesize;
-
-			/*
-			 * Allocate a new dirlock and add it to the list.
-			 */
-			namesize = strlen(name) + 1;
-			dl = kmem_alloc(sizeof (zfs_dirlock_t) + namesize,
-			    KM_SLEEP);
-			cv_init(&dl->dl_cv, NULL, CV_DEFAULT, NULL);
-			dl->dl_name = (char *)(dl + 1);
-			bcopy(name, dl->dl_name, namesize);
-			dl->dl_sharecnt = 0;
-			dl->dl_namelock = 0;
-			dl->dl_namesize = namesize;
-			dl->dl_dzp = dzp;
-			dl->dl_next = dzp->z_dirlocks;
-			dzp->z_dirlocks = dl;
-			break;
-		}
-		if ((flag & ZSHARED) && dl->dl_sharecnt != 0)
-			break;
-		cv_wait(&dl->dl_cv, &dzp->z_lock);
-	}
-
-	/*
-	 * If the z_name_lock was NOT held for this dirlock record it.
-	 */
-	if (flag & ZHAVELOCK)
-		dl->dl_namelock = 1;
-
-	if (flag & ZSHARED)
-		dl->dl_sharecnt++;
-
-	mutex_exit(&dzp->z_lock);
-
-	/*
-	 * We have a dirlock on the name.  (Note that it is the dirlock,
-	 * not the dzp's z_lock, that protects the name in the zap object.)
-	 * See if there's an object by this name; if so, put a hold on it.
-	 */
+	if (dzp->z_unlinked && !(flag & ZXATTR))
+		return (ENOENT);
 	if (flag & ZXATTR) {
 		error = sa_lookup(dzp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs), &zoid,
 		    sizeof (zoid));
 		if (error == 0)
 			error = (zoid == 0 ? ENOENT : 0);
 	} else {
-		if (update)
-			vp = dnlc_lookup(ZTOV(dzp), name);
-		if (vp == DNLC_NO_VNODE) {
-			VN_RELE(vp);
-			error = SET_ERROR(ENOENT);
-		} else if (vp) {
-			if (flag & ZNEW) {
-				zfs_dirent_unlock(dl);
-				VN_RELE(vp);
-				return (SET_ERROR(EEXIST));
-			}
-			*dlpp = dl;
-			*zpp = VTOZ(vp);
-			return (0);
-		} else {
-			error = zfs_match_find(zfsvfs, dzp, name, exact,
-			    update, direntflags, realpnp, &zoid);
-		}
+		error = zfs_match_find(zfsvfs, dzp, name, exact, &zoid);
 	}
 	if (error) {
 		if (error != ENOENT || (flag & ZEXISTS)) {
-			zfs_dirent_unlock(dl);
 			return (error);
 		}
 	} else {
 		if (flag & ZNEW) {
-			zfs_dirent_unlock(dl);
 			return (SET_ERROR(EEXIST));
 		}
 		error = zfs_zget(zfsvfs, zoid, zpp);
-		if (error) {
-			zfs_dirent_unlock(dl);
+		if (error)
 			return (error);
-		}
-		if (!(flag & ZXATTR) && update)
-			dnlc_update(ZTOV(dzp), name, ZTOV(*zpp));
+		ASSERT(!(*zpp)->z_unlinked);
 	}
 
-	*dlpp = dl;
-
 	return (0);
 }
 
-/*
- * Unlock this directory entry and wake anyone who was waiting for it.
- */
-void
-zfs_dirent_unlock(zfs_dirlock_t *dl)
+static int
+zfs_dd_lookup(znode_t *dzp, znode_t **zpp)
 {
-	znode_t *dzp = dl->dl_dzp;
-	zfs_dirlock_t **prev_dl, *cur_dl;
+	zfsvfs_t *zfsvfs = dzp->z_zfsvfs;
+	znode_t *zp;
+	vnode_t *vp;
+	uint64_t parent;
+	int error;
 
-	mutex_enter(&dzp->z_lock);
+	ASSERT_VOP_LOCKED(ZTOV(dzp), __func__);
+	ASSERT(RRM_READ_HELD(&zfsvfs->z_teardown_lock));
 
-	if (!dl->dl_namelock)
-		rw_exit(&dzp->z_name_lock);
+	if (dzp->z_unlinked)
+		return (ENOENT);
 
-	if (dl->dl_sharecnt > 1) {
-		dl->dl_sharecnt--;
-		mutex_exit(&dzp->z_lock);
-		return;
-	}
-	prev_dl = &dzp->z_dirlocks;
-	while ((cur_dl = *prev_dl) != dl)
-		prev_dl = &cur_dl->dl_next;
-	*prev_dl = dl->dl_next;
-	cv_broadcast(&dl->dl_cv);
-	mutex_exit(&dzp->z_lock);
+	if ((error = sa_lookup(dzp->z_sa_hdl,
+	    SA_ZPL_PARENT(zfsvfs), &parent, sizeof (parent))) != 0)
+		return (error);
 
-	cv_destroy(&dl->dl_cv);
-	kmem_free(dl, sizeof (*dl) + dl->dl_namesize);
+	/*
+	 * If we are a snapshot mounted under .zfs, return
+	 * the snapshot directory.
+	 */
+	if (parent == dzp->z_id && zfsvfs->z_parent != zfsvfs) {
+		error = zfsctl_root_lookup(zfsvfs->z_parent->z_ctldir,
+		    "snapshot", &vp, NULL, 0, NULL, kcred,
+		    NULL, NULL, NULL);
+		if (error == 0)
+			zp = VTOZ(vp);
+	} else {
+		error = zfs_zget(zfsvfs, parent, &zp);
+	}
+	if (error == 0)
+		*zpp = zp;
+	return (error);
 }
 
-/*
- * Look up an entry in a directory.
- *
- * NOTE: '.' and '..' are handled as special cases because
- *	no directory entries are actually stored for them.  If this is
- *	the root of a filesystem, then '.zfs' is also treated as a
- *	special pseudo-directory.
- */
 int
-zfs_dirlook(znode_t *dzp, char *name, vnode_t **vpp, int flags,
-    int *deflg, pathname_t *rpnp)
+zfs_dirlook(znode_t *dzp, const char *name, znode_t **zpp)
 {
-	zfs_dirlock_t *dl;
+	zfsvfs_t *zfsvfs = dzp->z_zfsvfs;
 	znode_t *zp;
 	int error = 0;
-	uint64_t parent;
-	int unlinked;
 
-	if (name[0] == 0 || (name[0] == '.' && name[1] == 0)) {
-		mutex_enter(&dzp->z_lock);
-		unlinked = dzp->z_unlinked;
-		mutex_exit(&dzp->z_lock);
-		if (unlinked)
-			return (ENOENT);
+	ASSERT_VOP_LOCKED(ZTOV(dzp), __func__);
+	ASSERT(RRM_READ_HELD(&zfsvfs->z_teardown_lock));
 
-		*vpp = ZTOV(dzp);
-		VN_HOLD(*vpp);
-	} else if (name[0] == '.' && name[1] == '.' && name[2] == 0) {
-		zfsvfs_t *zfsvfs = dzp->z_zfsvfs;
+	if (dzp->z_unlinked)
+		return (SET_ERROR(ENOENT));
 
-		/*
-		 * If we are a snapshot mounted under .zfs, return
-		 * the vp for the snapshot directory.
-		 */
-		if ((error = sa_lookup(dzp->z_sa_hdl,
-		    SA_ZPL_PARENT(zfsvfs), &parent, sizeof (parent))) != 0)
-			return (error);
-		if (parent == dzp->z_id && zfsvfs->z_parent != zfsvfs) {
-			error = zfsctl_root_lookup(zfsvfs->z_parent->z_ctldir,
-			    "snapshot", vpp, NULL, 0, NULL, kcred,
-			    NULL, NULL, NULL);
-			return (error);
-		}
-
-		mutex_enter(&dzp->z_lock);
-		unlinked = dzp->z_unlinked;
-		mutex_exit(&dzp->z_lock);
-		if (unlinked)
-			return (ENOENT);
-
-		rw_enter(&dzp->z_parent_lock, RW_READER);
-		error = zfs_zget(zfsvfs, parent, &zp);
-		if (error == 0)
-			*vpp = ZTOV(zp);
-		rw_exit(&dzp->z_parent_lock);
+	if (name[0] == 0 || (name[0] == '.' && name[1] == 0)) {
+		*zpp = dzp;
+	} else if (name[0] == '.' && name[1] == '.' && name[2] == 0) {
+		error = zfs_dd_lookup(dzp, zpp);
 	} else if (zfs_has_ctldir(dzp) && strcmp(name, ZFS_CTLDIR_NAME) == 0) {
-		*vpp = zfsctl_root(dzp);
+		*zpp = VTOZ(zfsctl_root(dzp));
 	} else {
-		int zf;
-
-		zf = ZEXISTS | ZSHARED;
-		if (flags & FIGNORECASE)
-			zf |= ZCILOOK;
-
-		error = zfs_dirent_lock(&dl, dzp, name, &zp, zf, deflg, rpnp);
+		error = zfs_dirent_lookup(dzp, name, &zp, ZEXISTS);
 		if (error == 0) {
-			*vpp = ZTOV(zp);
-			zfs_dirent_unlock(dl);
 			dzp->z_zn_prefetch = B_TRUE; /* enable prefetching */
+			*zpp = zp;
 		}
-		rpnp = NULL;
 	}
-
-	if ((flags & FIGNORECASE) && rpnp && !error)
-		(void) strlcpy(rpnp->pn_buf, name, rpnp->pn_bufsize);
-
 	return (error);
 }
 
 /*
  * unlinked Set (formerly known as the "delete queue") Error Handling
  *
  * When dealing with the unlinked set, we dmu_tx_hold_zap(), but we
  * don't specify the name of the entry that we will be manipulating.  We
  * also fib and say that we won't be adding any new entries to the
  * unlinked set, even though we might (this is to lower the minimum file
  * size that can be deleted in a full filesystem).  So on the small
  * chance that the nlink list is using a fat zap (ie. has more than
  * 2000 entries), we *may* not pre-read a block that's needed.
  * Therefore it is remotely possible for some of the assertions
  * regarding the unlinked set below to fail due to i/o error.  On a
  * nondebug system, this will result in the space being leaked.
  */
 void
 zfs_unlinked_add(znode_t *zp, dmu_tx_t *tx)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 
 	ASSERT(zp->z_unlinked);
 	ASSERT(zp->z_links == 0);
 
 	VERIFY3U(0, ==,
 	    zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx));
 }
 
 /*
  * Clean up any znodes that had no links when we either crashed or
  * (force) umounted the file system.
  */
 void
 zfs_unlinked_drain(zfsvfs_t *zfsvfs)
 {
 	zap_cursor_t	zc;
 	zap_attribute_t zap;
 	dmu_object_info_t doi;
 	znode_t		*zp;
 	int		error;
 
 	/*
 	 * Interate over the contents of the unlinked set.
 	 */
 	for (zap_cursor_init(&zc, zfsvfs->z_os, zfsvfs->z_unlinkedobj);
 	    zap_cursor_retrieve(&zc, &zap) == 0;
 	    zap_cursor_advance(&zc)) {
 
 		/*
 		 * See what kind of object we have in list
 		 */
 
 		error = dmu_object_info(zfsvfs->z_os,
 		    zap.za_first_integer, &doi);
 		if (error != 0)
 			continue;
 
 		ASSERT((doi.doi_type == DMU_OT_PLAIN_FILE_CONTENTS) ||
 		    (doi.doi_type == DMU_OT_DIRECTORY_CONTENTS));
 		/*
 		 * We need to re-mark these list entries for deletion,
 		 * so we pull them back into core and set zp->z_unlinked.
 		 */
 		error = zfs_zget(zfsvfs, zap.za_first_integer, &zp);
 
 		/*
 		 * We may pick up znodes that are already marked for deletion.
 		 * This could happen during the purge of an extended attribute
 		 * directory.  All we need to do is skip over them, since they
 		 * are already in the system marked z_unlinked.
 		 */
 		if (error != 0)
 			continue;
 
+		vn_lock(ZTOV(zp), LK_EXCLUSIVE | LK_RETRY);
 		zp->z_unlinked = B_TRUE;
-		VN_RELE(ZTOV(zp));
+		vput(ZTOV(zp));
 	}
 	zap_cursor_fini(&zc);
 }
 
 /*
  * Delete the entire contents of a directory.  Return a count
  * of the number of entries that could not be deleted. If we encounter
  * an error, return a count of at least one so that the directory stays
  * in the unlinked set.
  *
  * NOTE: this function assumes that the directory is inactive,
  *	so there is no need to lock its entries before deletion.
  *	Also, it assumes the directory contents is *only* regular
  *	files.
  */
 static int
 zfs_purgedir(znode_t *dzp)
 {
 	zap_cursor_t	zc;
 	zap_attribute_t	zap;
 	znode_t		*xzp;
 	dmu_tx_t	*tx;
 	zfsvfs_t	*zfsvfs = dzp->z_zfsvfs;
-	zfs_dirlock_t	dl;
 	int skipped = 0;
 	int error;
 
 	for (zap_cursor_init(&zc, zfsvfs->z_os, dzp->z_id);
 	    (error = zap_cursor_retrieve(&zc, &zap)) == 0;
 	    zap_cursor_advance(&zc)) {
 		error = zfs_zget(zfsvfs,
 		    ZFS_DIRENT_OBJ(zap.za_first_integer), &xzp);
 		if (error) {
 			skipped += 1;
 			continue;
 		}
 
+		vn_lock(ZTOV(xzp), LK_EXCLUSIVE | LK_RETRY);
 		ASSERT((ZTOV(xzp)->v_type == VREG) ||
 		    (ZTOV(xzp)->v_type == VLNK));
 
 		tx = dmu_tx_create(zfsvfs->z_os);
 		dmu_tx_hold_sa(tx, dzp->z_sa_hdl, B_FALSE);
 		dmu_tx_hold_zap(tx, dzp->z_id, FALSE, zap.za_name);
 		dmu_tx_hold_sa(tx, xzp->z_sa_hdl, B_FALSE);
 		dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, FALSE, NULL);
 		/* Is this really needed ? */
 		zfs_sa_upgrade_txholds(tx, xzp);
 		dmu_tx_mark_netfree(tx);
 		error = dmu_tx_assign(tx, TXG_WAIT);
 		if (error) {
 			dmu_tx_abort(tx);
-			VN_RELE(ZTOV(xzp));
+			vput(ZTOV(xzp));
 			skipped += 1;
 			continue;
 		}
-		bzero(&dl, sizeof (dl));
-		dl.dl_dzp = dzp;
-		dl.dl_name = zap.za_name;
 
-		error = zfs_link_destroy(&dl, xzp, tx, 0, NULL);
+		error = zfs_link_destroy(dzp, zap.za_name, xzp, tx, 0, NULL);
 		if (error)
 			skipped += 1;
 		dmu_tx_commit(tx);
 
-		VN_RELE(ZTOV(xzp));
+		vput(ZTOV(xzp));
 	}
 	zap_cursor_fini(&zc);
 	if (error != ENOENT)
 		skipped += 1;
 	return (skipped);
 }
 
 void
 zfs_rmnode(znode_t *zp)
 {
 	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
 	objset_t	*os = zfsvfs->z_os;
 	znode_t		*xzp = NULL;
 	dmu_tx_t	*tx;
 	uint64_t	acl_obj;
 	uint64_t	xattr_obj;
 	int		error;
 
 	ASSERT(zp->z_links == 0);
+	ASSERT_VOP_ELOCKED(ZTOV(zp), __func__);
 
 	/*
 	 * If this is an attribute directory, purge its contents.
 	 */
 	if (ZTOV(zp) != NULL && ZTOV(zp)->v_type == VDIR &&
 	    (zp->z_pflags & ZFS_XATTR)) {
 		if (zfs_purgedir(zp) != 0) {
 			/*
 			 * Not enough space to delete some xattrs.
 			 * Leave it in the unlinked set.
 			 */
 			zfs_znode_dmu_fini(zp);
 			zfs_znode_free(zp);
 			return;
 		}
 	} else {
 		/*
 		 * Free up all the data in the file.  We don't do this for
 		 * XATTR directories because we need truncate and remove to be
 		 * in the same tx, like in zfs_znode_delete(). Otherwise, if
 		 * we crash here we'll end up with an inconsistent truncated
 		 * zap object in the delete queue.  Note a truncated file is
 		 * harmless since it only contains user data.
 		 */
 		error = dmu_free_long_range(os, zp->z_id, 0, DMU_OBJECT_END);
 		if (error) {
 			/*
 			 * Not enough space.  Leave the file in the unlinked
 			 * set.
 			 */
 			zfs_znode_dmu_fini(zp);
 			zfs_znode_free(zp);
 			return;
 		}
 	}
 
 	/*
 	 * If the file has extended attributes, we're going to unlink
 	 * the xattr dir.
 	 */
 	error = sa_lookup(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs),
 	    &xattr_obj, sizeof (xattr_obj));
 	if (error == 0 && xattr_obj) {
 		error = zfs_zget(zfsvfs, xattr_obj, &xzp);
-		ASSERT(error == 0);
+		ASSERT3S(error, ==, 0);
+		vn_lock(ZTOV(xzp), LK_EXCLUSIVE | LK_RETRY);
 	}
 
 	acl_obj = zfs_external_acl(zp);
 
 	/*
 	 * Set up the final transaction.
 	 */
 	tx = dmu_tx_create(os);
 	dmu_tx_hold_free(tx, zp->z_id, 0, DMU_OBJECT_END);
 	dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, FALSE, NULL);
 	if (xzp) {
 		dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, TRUE, NULL);
 		dmu_tx_hold_sa(tx, xzp->z_sa_hdl, B_FALSE);
 	}
 	if (acl_obj)
 		dmu_tx_hold_free(tx, acl_obj, 0, DMU_OBJECT_END);
 
 	zfs_sa_upgrade_txholds(tx, zp);
 	error = dmu_tx_assign(tx, TXG_WAIT);
 	if (error) {
 		/*
 		 * Not enough space to delete the file.  Leave it in the
 		 * unlinked set, leaking it until the fs is remounted (at
 		 * which point we'll call zfs_unlinked_drain() to process it).
 		 */
 		dmu_tx_abort(tx);
 		zfs_znode_dmu_fini(zp);
 		zfs_znode_free(zp);
 		goto out;
 	}
 
 	if (xzp) {
 		ASSERT(error == 0);
-		mutex_enter(&xzp->z_lock);
 		xzp->z_unlinked = B_TRUE;	/* mark xzp for deletion */
 		xzp->z_links = 0;	/* no more links to it */
 		VERIFY(0 == sa_update(xzp->z_sa_hdl, SA_ZPL_LINKS(zfsvfs),
 		    &xzp->z_links, sizeof (xzp->z_links), tx));
-		mutex_exit(&xzp->z_lock);
 		zfs_unlinked_add(xzp, tx);
 	}
 
 	/* Remove this znode from the unlinked set */
 	VERIFY3U(0, ==,
 	    zap_remove_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx));
 
 	zfs_znode_delete(zp, tx);
 
 	dmu_tx_commit(tx);
 out:
 	if (xzp)
-		VN_RELE(ZTOV(xzp));
+		vput(ZTOV(xzp));
 }
 
 static uint64_t
 zfs_dirent(znode_t *zp, uint64_t mode)
 {
 	uint64_t de = zp->z_id;
 
 	if (zp->z_zfsvfs->z_version >= ZPL_VERSION_DIRENT_TYPE)
 		de |= IFTODT(mode) << 60;
 	return (de);
 }
 
 /*
- * Link zp into dl.  Can only fail if zp has been unlinked.
+ * Link zp into dzp.  Can only fail if zp has been unlinked.
  */
 int
-zfs_link_create(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag)
+zfs_link_create(znode_t *dzp, const char *name, znode_t *zp, dmu_tx_t *tx,
+    int flag)
 {
-	znode_t *dzp = dl->dl_dzp;
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	vnode_t *vp = ZTOV(zp);
 	uint64_t value;
 	int zp_is_dir = (vp->v_type == VDIR);
 	sa_bulk_attr_t bulk[5];
 	uint64_t mtime[2], ctime[2];
 	int count = 0;
 	int error;
 
-	mutex_enter(&zp->z_lock);
-
+	ASSERT_VOP_ELOCKED(ZTOV(dzp), __func__);
+	ASSERT_VOP_ELOCKED(ZTOV(zp), __func__);
+#if 0
+	if (zp_is_dir) {
+		error = 0;
+		if (dzp->z_links >= LINK_MAX)
+			error = SET_ERROR(EMLINK);
+		return (error);
+	}
+#endif
 	if (!(flag & ZRENAMING)) {
 		if (zp->z_unlinked) {	/* no new links to unlinked zp */
 			ASSERT(!(flag & (ZNEW | ZEXISTS)));
-			mutex_exit(&zp->z_lock);
 			return (SET_ERROR(ENOENT));
 		}
+#if 0
+		if (zp->z_links >= LINK_MAX) {
+			return (SET_ERROR(EMLINK));
+		}
+#endif
 		zp->z_links++;
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zfsvfs), NULL,
 		    &zp->z_links, sizeof (zp->z_links));
 
+	} else {
+		ASSERT(zp->z_unlinked == 0);
 	}
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_PARENT(zfsvfs), NULL,
 	    &dzp->z_id, sizeof (dzp->z_id));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL,
 	    &zp->z_pflags, sizeof (zp->z_pflags));
 
 	if (!(flag & ZNEW)) {
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL,
 		    ctime, sizeof (ctime));
 		zfs_tstamp_update_setup(zp, STATE_CHANGED, mtime,
 		    ctime, B_TRUE);
 	}
 	error = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx);
-	ASSERT(error == 0);
+	ASSERT0(error);
 
-	mutex_exit(&zp->z_lock);
-
-	mutex_enter(&dzp->z_lock);
 	dzp->z_size++;
 	dzp->z_links += zp_is_dir;
 	count = 0;
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_SIZE(zfsvfs), NULL,
 	    &dzp->z_size, sizeof (dzp->z_size));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zfsvfs), NULL,
 	    &dzp->z_links, sizeof (dzp->z_links));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL,
 	    mtime, sizeof (mtime));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL,
 	    ctime, sizeof (ctime));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL,
 	    &dzp->z_pflags, sizeof (dzp->z_pflags));
 	zfs_tstamp_update_setup(dzp, CONTENT_MODIFIED, mtime, ctime, B_TRUE);
 	error = sa_bulk_update(dzp->z_sa_hdl, bulk, count, tx);
-	ASSERT(error == 0);
-	mutex_exit(&dzp->z_lock);
+	ASSERT0(error);
 
 	value = zfs_dirent(zp, zp->z_mode);
-	error = zap_add(zp->z_zfsvfs->z_os, dzp->z_id, dl->dl_name,
+	error = zap_add(zp->z_zfsvfs->z_os, dzp->z_id, name,
 	    8, 1, &value, tx);
-	ASSERT(error == 0);
+	VERIFY0(error);
 
-	dnlc_update(ZTOV(dzp), dl->dl_name, vp);
-
 	return (0);
 }
 
 static int
-zfs_dropname(zfs_dirlock_t *dl, znode_t *zp, znode_t *dzp, dmu_tx_t *tx,
+zfs_dropname(znode_t *dzp, const char *name, znode_t *zp, dmu_tx_t *tx,
     int flag)
 {
 	int error;
 
 	if (zp->z_zfsvfs->z_norm) {
-		if (((zp->z_zfsvfs->z_case == ZFS_CASE_INSENSITIVE) &&
-		    (flag & ZCIEXACT)) ||
-		    ((zp->z_zfsvfs->z_case == ZFS_CASE_MIXED) &&
-		    !(flag & ZCILOOK)))
+		if (zp->z_zfsvfs->z_case == ZFS_CASE_MIXED)
 			error = zap_remove_norm(zp->z_zfsvfs->z_os,
-			    dzp->z_id, dl->dl_name, MT_EXACT, tx);
+			    dzp->z_id, name, MT_EXACT, tx);
 		else
 			error = zap_remove_norm(zp->z_zfsvfs->z_os,
-			    dzp->z_id, dl->dl_name, MT_FIRST, tx);
+			    dzp->z_id, name, MT_FIRST, tx);
 	} else {
 		error = zap_remove(zp->z_zfsvfs->z_os,
-		    dzp->z_id, dl->dl_name, tx);
+		    dzp->z_id, name, tx);
 	}
 
 	return (error);
 }
 
 /*
- * Unlink zp from dl, and mark zp for deletion if this was the last link.
+ * Unlink zp from dzp, and mark zp for deletion if this was the last link.
  * Can fail if zp is a mount point (EBUSY) or a non-empty directory (EEXIST).
  * If 'unlinkedp' is NULL, we put unlinked znodes on the unlinked list.
  * If it's non-NULL, we use it to indicate whether the znode needs deletion,
  * and it's the caller's job to do it.
  */
 int
-zfs_link_destroy(zfs_dirlock_t *dl, znode_t *zp, dmu_tx_t *tx, int flag,
-    boolean_t *unlinkedp)
+zfs_link_destroy(znode_t *dzp, const char *name, znode_t *zp, dmu_tx_t *tx,
+    int flag, boolean_t *unlinkedp)
 {
-	znode_t *dzp = dl->dl_dzp;
 	zfsvfs_t *zfsvfs = dzp->z_zfsvfs;
 	vnode_t *vp = ZTOV(zp);
 	int zp_is_dir = (vp->v_type == VDIR);
 	boolean_t unlinked = B_FALSE;
 	sa_bulk_attr_t bulk[5];
 	uint64_t mtime[2], ctime[2];
 	int count = 0;
 	int error;
 
-	dnlc_remove(ZTOV(dzp), dl->dl_name);
+	ASSERT_VOP_ELOCKED(ZTOV(dzp), __func__);
+	ASSERT_VOP_ELOCKED(ZTOV(zp), __func__);
 
 	if (!(flag & ZRENAMING)) {
-		if (vn_vfswlock(vp))		/* prevent new mounts on zp */
-			return (SET_ERROR(EBUSY));
 
-		if (vn_ismntpt(vp)) {		/* don't remove mount point */
-			vn_vfsunlock(vp);
-			return (SET_ERROR(EBUSY));
-		}
-
-		mutex_enter(&zp->z_lock);
-
 		if (zp_is_dir && !zfs_dirempty(zp)) {
-			mutex_exit(&zp->z_lock);
-			vn_vfsunlock(vp);
 #ifdef illumos
 			return (SET_ERROR(EEXIST));
 #else
 			return (SET_ERROR(ENOTEMPTY));
 #endif
 		}
 
 		/*
 		 * If we get here, we are going to try to remove the object.
 		 * First try removing the name from the directory; if that
 		 * fails, return the error.
 		 */
-		error = zfs_dropname(dl, zp, dzp, tx, flag);
+		error = zfs_dropname(dzp, name, zp, tx, flag);
 		if (error != 0) {
-			mutex_exit(&zp->z_lock);
-			vn_vfsunlock(vp);
 			return (error);
 		}
 
 		if (zp->z_links <= zp_is_dir) {
 			zfs_panic_recover("zfs: link count on vnode %p is %u, "
 			    "should be at least %u", zp->z_vnode,
 			    (int)zp->z_links,
 			    zp_is_dir + 1);
 			zp->z_links = zp_is_dir + 1;
 		}
 		if (--zp->z_links == zp_is_dir) {
 			zp->z_unlinked = B_TRUE;
 			zp->z_links = 0;
 			unlinked = B_TRUE;
 		} else {
 			SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs),
 			    NULL, &ctime, sizeof (ctime));
 			SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs),
 			    NULL, &zp->z_pflags, sizeof (zp->z_pflags));
 			zfs_tstamp_update_setup(zp, STATE_CHANGED, mtime, ctime,
 			    B_TRUE);
 		}
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zfsvfs),
 		    NULL, &zp->z_links, sizeof (zp->z_links));
 		error = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx);
 		count = 0;
-		ASSERT(error == 0);
-		mutex_exit(&zp->z_lock);
-		vn_vfsunlock(vp);
+		ASSERT0(error);
 	} else {
-		error = zfs_dropname(dl, zp, dzp, tx, flag);
+		ASSERT(zp->z_unlinked == 0);
+		error = zfs_dropname(dzp, name, zp, tx, flag);
 		if (error != 0)
 			return (error);
 	}
 
-	mutex_enter(&dzp->z_lock);
 	dzp->z_size--;		/* one dirent removed */
 	dzp->z_links -= zp_is_dir;	/* ".." link from zp */
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zfsvfs),
 	    NULL, &dzp->z_links, sizeof (dzp->z_links));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_SIZE(zfsvfs),
 	    NULL, &dzp->z_size, sizeof (dzp->z_size));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs),
 	    NULL, ctime, sizeof (ctime));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs),
 	    NULL, mtime, sizeof (mtime));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs),
 	    NULL, &dzp->z_pflags, sizeof (dzp->z_pflags));
 	zfs_tstamp_update_setup(dzp, CONTENT_MODIFIED, mtime, ctime, B_TRUE);
 	error = sa_bulk_update(dzp->z_sa_hdl, bulk, count, tx);
-	ASSERT(error == 0);
-	mutex_exit(&dzp->z_lock);
+	ASSERT0(error);
 
 	if (unlinkedp != NULL)
 		*unlinkedp = unlinked;
 	else if (unlinked)
 		zfs_unlinked_add(zp, tx);
 
 	return (0);
 }
 
 /*
- * Indicate whether the directory is empty.  Works with or without z_lock
- * held, but can only be consider a hint in the latter case.  Returns true
- * if only "." and ".." remain and there's no work in progress.
+ * Indicate whether the directory is empty.
  */
 boolean_t
 zfs_dirempty(znode_t *dzp)
 {
-	return (dzp->z_size == 2 && dzp->z_dirlocks == 0);
+	return (dzp->z_size == 2);
 }
 
 int
 zfs_make_xattrdir(znode_t *zp, vattr_t *vap, vnode_t **xvpp, cred_t *cr)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	znode_t *xzp;
 	dmu_tx_t *tx;
 	int error;
 	zfs_acl_ids_t acl_ids;
 	boolean_t fuid_dirtied;
 	uint64_t parent;
 
 	*xvpp = NULL;
 
 	/*
 	 * In FreeBSD, access checking for creating an EA is being done
 	 * in zfs_setextattr(),
 	 */
 #ifndef __FreeBSD_kernel__
 	if (error = zfs_zaccess(zp, ACE_WRITE_NAMED_ATTRS, 0, B_FALSE, cr))
 		return (error);
 #endif
 
 	if ((error = zfs_acl_ids_create(zp, IS_XATTR, vap, cr, NULL,
 	    &acl_ids)) != 0)
 		return (error);
 	if (zfs_acl_ids_overquota(zfsvfs, &acl_ids)) {
 		zfs_acl_ids_free(&acl_ids);
 		return (SET_ERROR(EDQUOT));
 	}
 
 	getnewvnode_reserve(1);
 
 	tx = dmu_tx_create(zfsvfs->z_os);
 	dmu_tx_hold_sa_create(tx, acl_ids.z_aclp->z_acl_bytes +
 	    ZFS_SA_BASE_ATTR_SIZE);
 	dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE);
 	dmu_tx_hold_zap(tx, DMU_NEW_OBJECT, FALSE, NULL);
 	fuid_dirtied = zfsvfs->z_fuid_dirty;
 	if (fuid_dirtied)
 		zfs_fuid_txhold(zfsvfs, tx);
 	error = dmu_tx_assign(tx, TXG_WAIT);
 	if (error) {
 		zfs_acl_ids_free(&acl_ids);
 		dmu_tx_abort(tx);
 		return (error);
 	}
 	zfs_mknode(zp, vap, tx, cr, IS_XATTR, &xzp, &acl_ids);
 
 	if (fuid_dirtied)
 		zfs_fuid_sync(zfsvfs, tx);
 
 #ifdef DEBUG
 	error = sa_lookup(xzp->z_sa_hdl, SA_ZPL_PARENT(zfsvfs),
 	    &parent, sizeof (parent));
 	ASSERT(error == 0 && parent == zp->z_id);
 #endif
 
 	VERIFY(0 == sa_update(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs), &xzp->z_id,
 	    sizeof (xzp->z_id), tx));
 
 	(void) zfs_log_create(zfsvfs->z_log, tx, TX_MKXATTR, zp,
 	    xzp, "", NULL, acl_ids.z_fuidp, vap);
 
 	zfs_acl_ids_free(&acl_ids);
 	dmu_tx_commit(tx);
 
 	getnewvnode_drop_reserve();
 
 	*xvpp = ZTOV(xzp);
 
 	return (0);
 }
 
 /*
  * Return a znode for the extended attribute directory for zp.
  * ** If the directory does not already exist, it is created **
  *
  *	IN:	zp	- znode to obtain attribute directory from
  *		cr	- credentials of caller
  *		flags	- flags from the VOP_LOOKUP call
  *
  *	OUT:	xzpp	- pointer to extended attribute znode
  *
  *	RETURN:	0 on success
  *		error number on failure
  */
 int
 zfs_get_xattrdir(znode_t *zp, vnode_t **xvpp, cred_t *cr, int flags)
 {
 	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
 	znode_t		*xzp;
-	zfs_dirlock_t	*dl;
 	vattr_t		va;
 	int		error;
 top:
-	error = zfs_dirent_lock(&dl, zp, "", &xzp, ZXATTR, NULL, NULL);
+	error = zfs_dirent_lookup(zp, "", &xzp, ZXATTR);
 	if (error)
 		return (error);
 
 	if (xzp != NULL) {
 		*xvpp = ZTOV(xzp);
-		zfs_dirent_unlock(dl);
 		return (0);
 	}
 
 
 	if (!(flags & CREATE_XATTR_DIR)) {
-		zfs_dirent_unlock(dl);
 #ifdef illumos
 		return (SET_ERROR(ENOENT));
 #else
 		return (SET_ERROR(ENOATTR));
 #endif
 	}
 
 	if (zfsvfs->z_vfs->vfs_flag & VFS_RDONLY) {
-		zfs_dirent_unlock(dl);
 		return (SET_ERROR(EROFS));
 	}
 
 	/*
 	 * The ability to 'create' files in an attribute
 	 * directory comes from the write_xattr permission on the base file.
 	 *
 	 * The ability to 'search' an attribute directory requires
 	 * read_xattr permission on the base file.
 	 *
 	 * Once in a directory the ability to read/write attributes
 	 * is controlled by the permissions on the attribute file.
 	 */
 	va.va_mask = AT_TYPE | AT_MODE | AT_UID | AT_GID;
 	va.va_type = VDIR;
 	va.va_mode = S_IFDIR | S_ISVTX | 0777;
 	zfs_fuid_map_ids(zp, cr, &va.va_uid, &va.va_gid);
 
 	error = zfs_make_xattrdir(zp, &va, xvpp, cr);
-	zfs_dirent_unlock(dl);
 
 	if (error == ERESTART) {
 		/* NB: we already did dmu_tx_wait() if necessary */
 		goto top;
 	}
 	if (error == 0)
 		VOP_UNLOCK(*xvpp, 0);
 
 	return (error);
 }
 
 /*
  * Decide whether it is okay to remove within a sticky directory.
  *
  * In sticky directories, write access is not sufficient;
  * you can remove entries from a directory only if:
  *
  *	you own the directory,
  *	you own the entry,
  *	the entry is a plain file and you have write access,
  *	or you are privileged (checked in secpolicy...).
  *
  * The function returns 0 if remove access is granted.
  */
 int
 zfs_sticky_remove_access(znode_t *zdp, znode_t *zp, cred_t *cr)
 {
 	uid_t  		uid;
 	uid_t		downer;
 	uid_t		fowner;
 	zfsvfs_t	*zfsvfs = zdp->z_zfsvfs;
 
 	if (zdp->z_zfsvfs->z_replay)
 		return (0);
 
 	if ((zdp->z_mode & S_ISVTX) == 0)
 		return (0);
 
 	downer = zfs_fuid_map_id(zfsvfs, zdp->z_uid, cr, ZFS_OWNER);
 	fowner = zfs_fuid_map_id(zfsvfs, zp->z_uid, cr, ZFS_OWNER);
 
 	if ((uid = crgetuid(cr)) == downer || uid == fowner ||
 	    (ZTOV(zp)->v_type == VREG &&
 	    zfs_zaccess(zp, ACE_WRITE_DATA, 0, B_FALSE, cr) == 0))
 		return (0);
 	else
 		return (secpolicy_vnode_remove(ZTOV(zp), cr));
 }
Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_sa.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_sa.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_sa.c	(revision 303775)
@@ -1,333 +1,327 @@
 /*
  * CDDL HEADER START
  *
  * The contents of this file are subject to the terms of the
  * Common Development and Distribution License (the "License").
  * You may not use this file except in compliance with the License.
  *
  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  * or http://www.opensolaris.org/os/licensing.
  * See the License for the specific language governing permissions
  * and limitations under the License.
  *
  * When distributing Covered Code, include this CDDL HEADER in each
  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  * If applicable, add the following below this CDDL HEADER, with the
  * fields enclosed by brackets "[]" replaced with your own identifying
  * information: Portions Copyright [yyyy] [name of copyright owner]
  *
  * CDDL HEADER END
  */
 /*
  * Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
  */
 
 #include <sys/zfs_context.h>
 #include <sys/vnode.h>
 #include <sys/sa.h>
 #include <sys/zfs_acl.h>
 #include <sys/zfs_sa.h>
 
 /*
  * ZPL attribute registration table.
  * Order of attributes doesn't matter
  * a unique value will be assigned for each
  * attribute that is file system specific
  *
  * This is just the set of ZPL attributes that this
  * version of ZFS deals with natively.  The file system
  * could have other attributes stored in files, but they will be
  * ignored.  The SA framework will preserve them, just that
  * this version of ZFS won't change or delete them.
  */
 
 sa_attr_reg_t zfs_attr_table[ZPL_END+1] = {
 	{"ZPL_ATIME", sizeof (uint64_t) * 2, SA_UINT64_ARRAY, 0},
 	{"ZPL_MTIME", sizeof (uint64_t) * 2, SA_UINT64_ARRAY, 1},
 	{"ZPL_CTIME", sizeof (uint64_t) * 2, SA_UINT64_ARRAY, 2},
 	{"ZPL_CRTIME", sizeof (uint64_t) * 2, SA_UINT64_ARRAY, 3},
 	{"ZPL_GEN", sizeof (uint64_t), SA_UINT64_ARRAY, 4},
 	{"ZPL_MODE", sizeof (uint64_t), SA_UINT64_ARRAY, 5},
 	{"ZPL_SIZE", sizeof (uint64_t), SA_UINT64_ARRAY, 6},
 	{"ZPL_PARENT", sizeof (uint64_t), SA_UINT64_ARRAY, 7},
 	{"ZPL_LINKS", sizeof (uint64_t), SA_UINT64_ARRAY, 8},
 	{"ZPL_XATTR", sizeof (uint64_t), SA_UINT64_ARRAY, 9},
 	{"ZPL_RDEV", sizeof (uint64_t), SA_UINT64_ARRAY, 10},
 	{"ZPL_FLAGS", sizeof (uint64_t), SA_UINT64_ARRAY, 11},
 	{"ZPL_UID", sizeof (uint64_t), SA_UINT64_ARRAY, 12},
 	{"ZPL_GID", sizeof (uint64_t), SA_UINT64_ARRAY, 13},
 	{"ZPL_PAD", sizeof (uint64_t) * 4, SA_UINT64_ARRAY, 14},
 	{"ZPL_ZNODE_ACL", 88, SA_UINT8_ARRAY, 15},
 	{"ZPL_DACL_COUNT", sizeof (uint64_t), SA_UINT64_ARRAY, 0},
 	{"ZPL_SYMLINK", 0, SA_UINT8_ARRAY, 0},
 	{"ZPL_SCANSTAMP", 32, SA_UINT8_ARRAY, 0},
 	{"ZPL_DACL_ACES", 0, SA_ACL, 0},
 	{NULL, 0, 0, 0}
 };
 
 #ifdef _KERNEL
 
 int
 zfs_sa_readlink(znode_t *zp, uio_t *uio)
 {
 	dmu_buf_t *db = sa_get_db(zp->z_sa_hdl);
 	size_t bufsz;
 	int error;
 
 	bufsz = zp->z_size;
 	if (bufsz + ZFS_OLD_ZNODE_PHYS_SIZE <= db->db_size) {
 		error = uiomove((caddr_t)db->db_data +
 		    ZFS_OLD_ZNODE_PHYS_SIZE,
 		    MIN((size_t)bufsz, uio->uio_resid), UIO_READ, uio);
 	} else {
 		dmu_buf_t *dbp;
 		if ((error = dmu_buf_hold(zp->z_zfsvfs->z_os, zp->z_id,
 		    0, FTAG, &dbp, DMU_READ_NO_PREFETCH)) == 0) {
 			error = uiomove(dbp->db_data,
 			    MIN((size_t)bufsz, uio->uio_resid), UIO_READ, uio);
 			dmu_buf_rele(dbp, FTAG);
 		}
 	}
 	return (error);
 }
 
 void
 zfs_sa_symlink(znode_t *zp, char *link, int len, dmu_tx_t *tx)
 {
 	dmu_buf_t *db = sa_get_db(zp->z_sa_hdl);
 
 	if (ZFS_OLD_ZNODE_PHYS_SIZE + len <= dmu_bonus_max()) {
 		VERIFY(dmu_set_bonus(db,
 		    len + ZFS_OLD_ZNODE_PHYS_SIZE, tx) == 0);
 		if (len) {
 			bcopy(link, (caddr_t)db->db_data +
 			    ZFS_OLD_ZNODE_PHYS_SIZE, len);
 		}
 	} else {
 		dmu_buf_t *dbp;
 
 		zfs_grow_blocksize(zp, len, tx);
 		VERIFY(0 == dmu_buf_hold(zp->z_zfsvfs->z_os,
 		    zp->z_id, 0, FTAG, &dbp, DMU_READ_NO_PREFETCH));
 
 		dmu_buf_will_dirty(dbp, tx);
 
 		ASSERT3U(len, <=, dbp->db_size);
 		bcopy(link, dbp->db_data, len);
 		dmu_buf_rele(dbp, FTAG);
 	}
 }
 
 void
 zfs_sa_get_scanstamp(znode_t *zp, xvattr_t *xvap)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	xoptattr_t *xoap;
 
-	ASSERT(MUTEX_HELD(&zp->z_lock));
+	ASSERT_VOP_LOCKED(ZTOV(zp), __func__);
 	VERIFY((xoap = xva_getxoptattr(xvap)) != NULL);
 	if (zp->z_is_sa) {
 		if (sa_lookup(zp->z_sa_hdl, SA_ZPL_SCANSTAMP(zfsvfs),
 		    &xoap->xoa_av_scanstamp,
 		    sizeof (xoap->xoa_av_scanstamp)) != 0)
 			return;
 	} else {
 		dmu_object_info_t doi;
 		dmu_buf_t *db = sa_get_db(zp->z_sa_hdl);
 		int len;
 
 		if (!(zp->z_pflags & ZFS_BONUS_SCANSTAMP))
 			return;
 
 		sa_object_info(zp->z_sa_hdl, &doi);
 		len = sizeof (xoap->xoa_av_scanstamp) +
 		    ZFS_OLD_ZNODE_PHYS_SIZE;
 
 		if (len <= doi.doi_bonus_size) {
 			(void) memcpy(xoap->xoa_av_scanstamp,
 			    (caddr_t)db->db_data + ZFS_OLD_ZNODE_PHYS_SIZE,
 			    sizeof (xoap->xoa_av_scanstamp));
 		}
 	}
 	XVA_SET_RTN(xvap, XAT_AV_SCANSTAMP);
 }
 
 void
 zfs_sa_set_scanstamp(znode_t *zp, xvattr_t *xvap, dmu_tx_t *tx)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	xoptattr_t *xoap;
 
-	ASSERT(MUTEX_HELD(&zp->z_lock));
+	ASSERT_VOP_ELOCKED(ZTOV(zp), __func__);
 	VERIFY((xoap = xva_getxoptattr(xvap)) != NULL);
 	if (zp->z_is_sa)
 		VERIFY(0 == sa_update(zp->z_sa_hdl, SA_ZPL_SCANSTAMP(zfsvfs),
 		    &xoap->xoa_av_scanstamp,
 		    sizeof (xoap->xoa_av_scanstamp), tx));
 	else {
 		dmu_object_info_t doi;
 		dmu_buf_t *db = sa_get_db(zp->z_sa_hdl);
 		int len;
 
 		sa_object_info(zp->z_sa_hdl, &doi);
 		len = sizeof (xoap->xoa_av_scanstamp) +
 		    ZFS_OLD_ZNODE_PHYS_SIZE;
 		if (len > doi.doi_bonus_size)
 			VERIFY(dmu_set_bonus(db, len, tx) == 0);
 		(void) memcpy((caddr_t)db->db_data + ZFS_OLD_ZNODE_PHYS_SIZE,
 		    xoap->xoa_av_scanstamp, sizeof (xoap->xoa_av_scanstamp));
 
 		zp->z_pflags |= ZFS_BONUS_SCANSTAMP;
 		VERIFY(0 == sa_update(zp->z_sa_hdl, SA_ZPL_FLAGS(zfsvfs),
 		    &zp->z_pflags, sizeof (uint64_t), tx));
 	}
 }
 
 /*
  * I'm not convinced we should do any of this upgrade.
  * since the SA code can read both old/new znode formats
  * with probably little to no performance difference.
  *
  * All new files will be created with the new format.
  */
 
 void
 zfs_sa_upgrade(sa_handle_t *hdl, dmu_tx_t *tx)
 {
 	dmu_buf_t *db = sa_get_db(hdl);
 	znode_t *zp = sa_get_userdata(hdl);
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	sa_bulk_attr_t bulk[20];
 	int count = 0;
 	sa_bulk_attr_t sa_attrs[20] = { 0 };
 	zfs_acl_locator_cb_t locate = { 0 };
 	uint64_t uid, gid, mode, rdev, xattr, parent;
 	uint64_t crtime[2], mtime[2], ctime[2];
 	zfs_acl_phys_t znode_acl;
 	char scanstamp[AV_SCANSTAMP_SZ];
-	boolean_t drop_lock = B_FALSE;
 
 	/*
 	 * No upgrade if ACL isn't cached
 	 * since we won't know which locks are held
 	 * and ready the ACL would require special "locked"
 	 * interfaces that would be messy
 	 */
 	if (zp->z_acl_cached == NULL || ZTOV(zp)->v_type == VLNK)
 		return;
 
 	/*
-	 * If the z_lock is held and we aren't the owner
-	 * the just return since we don't want to deadlock
+	 * If the vnode lock is held and we aren't the owner
+	 * then just return since we don't want to deadlock
 	 * trying to update the status of z_is_sa.  This
 	 * file can then be upgraded at a later time.
 	 *
 	 * Otherwise, we know we are doing the
 	 * sa_update() that caused us to enter this function.
 	 */
-	if (mutex_owner(&zp->z_lock) != curthread) {
-		if (mutex_tryenter(&zp->z_lock) == 0)
+	if (vn_lock(ZTOV(zp), LK_EXCLUSIVE | LK_NOWAIT) != 0)
 			return;
-		else
-			drop_lock = B_TRUE;
-	}
 
 	/* First do a bulk query of the attributes that aren't cached */
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL, &mtime, 16);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, &ctime, 16);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CRTIME(zfsvfs), NULL, &crtime, 16);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zfsvfs), NULL, &mode, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_PARENT(zfsvfs), NULL, &parent, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_XATTR(zfsvfs), NULL, &xattr, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_RDEV(zfsvfs), NULL, &rdev, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_UID(zfsvfs), NULL, &uid, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GID(zfsvfs), NULL, &gid, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ZNODE_ACL(zfsvfs), NULL,
 	    &znode_acl, 88);
 
 	if (sa_bulk_lookup_locked(hdl, bulk, count) != 0)
 		goto done;
 
 
 	/*
 	 * While the order here doesn't matter its best to try and organize
 	 * it is such a way to pick up an already existing layout number
 	 */
 	count = 0;
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_MODE(zfsvfs), NULL, &mode, 8);
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_SIZE(zfsvfs), NULL,
 	    &zp->z_size, 8);
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_GEN(zfsvfs),
 	    NULL, &zp->z_gen, 8);
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_UID(zfsvfs), NULL, &uid, 8);
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_GID(zfsvfs), NULL, &gid, 8);
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_PARENT(zfsvfs),
 	    NULL, &parent, 8);
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_FLAGS(zfsvfs), NULL,
 	    &zp->z_pflags, 8);
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_ATIME(zfsvfs), NULL,
 	    zp->z_atime, 16);
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_MTIME(zfsvfs), NULL,
 	    &mtime, 16);
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_CTIME(zfsvfs), NULL,
 	    &ctime, 16);
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_CRTIME(zfsvfs), NULL,
 	    &crtime, 16);
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_LINKS(zfsvfs), NULL,
 	    &zp->z_links, 8);
 	if (zp->z_vnode->v_type == VBLK || zp->z_vnode->v_type == VCHR)
 		SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_RDEV(zfsvfs), NULL,
 		    &rdev, 8);
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_DACL_COUNT(zfsvfs), NULL,
 	    &zp->z_acl_cached->z_acl_count, 8);
 
 	if (zp->z_acl_cached->z_version < ZFS_ACL_VERSION_FUID)
 		zfs_acl_xform(zp, zp->z_acl_cached, CRED());
 
 	locate.cb_aclp = zp->z_acl_cached;
 	SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_DACL_ACES(zfsvfs),
 	    zfs_acl_data_locator, &locate, zp->z_acl_cached->z_acl_bytes);
 
 	if (xattr)
 		SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_XATTR(zfsvfs),
 		    NULL, &xattr, 8);
 
 	/* if scanstamp then add scanstamp */
 
 	if (zp->z_pflags & ZFS_BONUS_SCANSTAMP) {
 		bcopy((caddr_t)db->db_data + ZFS_OLD_ZNODE_PHYS_SIZE,
 		    scanstamp, AV_SCANSTAMP_SZ);
 		SA_ADD_BULK_ATTR(sa_attrs, count, SA_ZPL_SCANSTAMP(zfsvfs),
 		    NULL, scanstamp, AV_SCANSTAMP_SZ);
 		zp->z_pflags &= ~ZFS_BONUS_SCANSTAMP;
 	}
 
 	VERIFY(dmu_set_bonustype(db, DMU_OT_SA, tx) == 0);
 	VERIFY(sa_replace_all_by_template_locked(hdl, sa_attrs,
 	    count, tx) == 0);
 	if (znode_acl.z_acl_extern_obj)
 		VERIFY(0 == dmu_object_free(zfsvfs->z_os,
 		    znode_acl.z_acl_extern_obj, tx));
 
 	zp->z_is_sa = B_TRUE;
 done:
-	if (drop_lock)
-		mutex_exit(&zp->z_lock);
+	VOP_UNLOCK(ZTOV(zp), 0);
 }
 
 void
 zfs_sa_upgrade_txholds(dmu_tx_t *tx, znode_t *zp)
 {
 	if (!zp->z_zfsvfs->z_use_sa || zp->z_is_sa)
 		return;
 
 
 	dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE);
 
 	if (zfs_external_acl(zp)) {
 		dmu_tx_hold_free(tx, zfs_external_acl(zp), 0,
 		    DMU_OBJECT_END);
 	}
 }
 
 #endif
Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c	(revision 303775)
@@ -1,2502 +1,2518 @@
 /*
  * CDDL HEADER START
  *
  * The contents of this file are subject to the terms of the
  * Common Development and Distribution License (the "License").
  * You may not use this file except in compliance with the License.
  *
  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  * or http://www.opensolaris.org/os/licensing.
  * See the License for the specific language governing permissions
  * and limitations under the License.
  *
  * When distributing Covered Code, include this CDDL HEADER in each
  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  * If applicable, add the following below this CDDL HEADER, with the
  * fields enclosed by brackets "[]" replaced with your own identifying
  * information: Portions Copyright [yyyy] [name of copyright owner]
  *
  * CDDL HEADER END
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  * Copyright (c) 2011 Pawel Jakub Dawidek <pawel@dawidek.net>.
  * All rights reserved.
  * Copyright (c) 2012, 2015 by Delphix. All rights reserved.
  * Copyright (c) 2014 Integros [integros.com]
  */
 
 /* Portions Copyright 2010 Robert Milkowski */
 
 #include <sys/types.h>
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/sysmacros.h>
 #include <sys/kmem.h>
 #include <sys/acl.h>
 #include <sys/vnode.h>
 #include <sys/vfs.h>
 #include <sys/mntent.h>
 #include <sys/mount.h>
 #include <sys/cmn_err.h>
 #include <sys/zfs_znode.h>
 #include <sys/zfs_dir.h>
 #include <sys/zil.h>
 #include <sys/fs/zfs.h>
 #include <sys/dmu.h>
 #include <sys/dsl_prop.h>
 #include <sys/dsl_dataset.h>
 #include <sys/dsl_deleg.h>
 #include <sys/spa.h>
 #include <sys/zap.h>
 #include <sys/sa.h>
 #include <sys/sa_impl.h>
 #include <sys/varargs.h>
 #include <sys/policy.h>
 #include <sys/atomic.h>
 #include <sys/zfs_ioctl.h>
 #include <sys/zfs_ctldir.h>
 #include <sys/zfs_fuid.h>
 #include <sys/sunddi.h>
 #include <sys/dnlc.h>
 #include <sys/dmu_objset.h>
 #include <sys/spa_boot.h>
 #include <sys/jail.h>
 #include "zfs_comutil.h"
 
 struct mtx zfs_debug_mtx;
 MTX_SYSINIT(zfs_debug_mtx, &zfs_debug_mtx, "zfs_debug", MTX_DEF);
 
 SYSCTL_NODE(_vfs, OID_AUTO, zfs, CTLFLAG_RW, 0, "ZFS file system");
 
 int zfs_super_owner;
 SYSCTL_INT(_vfs_zfs, OID_AUTO, super_owner, CTLFLAG_RW, &zfs_super_owner, 0,
     "File system owner can perform privileged operation on his file systems");
 
 int zfs_debug_level;
 SYSCTL_INT(_vfs_zfs, OID_AUTO, debug, CTLFLAG_RWTUN, &zfs_debug_level, 0,
     "Debug level");
 
 SYSCTL_NODE(_vfs_zfs, OID_AUTO, version, CTLFLAG_RD, 0, "ZFS versions");
 static int zfs_version_acl = ZFS_ACL_VERSION;
 SYSCTL_INT(_vfs_zfs_version, OID_AUTO, acl, CTLFLAG_RD, &zfs_version_acl, 0,
     "ZFS_ACL_VERSION");
 static int zfs_version_spa = SPA_VERSION;
 SYSCTL_INT(_vfs_zfs_version, OID_AUTO, spa, CTLFLAG_RD, &zfs_version_spa, 0,
     "SPA_VERSION");
 static int zfs_version_zpl = ZPL_VERSION;
 SYSCTL_INT(_vfs_zfs_version, OID_AUTO, zpl, CTLFLAG_RD, &zfs_version_zpl, 0,
     "ZPL_VERSION");
 
 static int zfs_mount(vfs_t *vfsp);
 static int zfs_umount(vfs_t *vfsp, int fflag);
 static int zfs_root(vfs_t *vfsp, int flags, vnode_t **vpp);
 static int zfs_statfs(vfs_t *vfsp, struct statfs *statp);
 static int zfs_vget(vfs_t *vfsp, ino_t ino, int flags, vnode_t **vpp);
 static int zfs_sync(vfs_t *vfsp, int waitfor);
 static int zfs_checkexp(vfs_t *vfsp, struct sockaddr *nam, int *extflagsp,
     struct ucred **credanonp, int *numsecflavors, int **secflavors);
 static int zfs_fhtovp(vfs_t *vfsp, fid_t *fidp, int flags, vnode_t **vpp);
 static void zfs_objset_close(zfsvfs_t *zfsvfs);
 static void zfs_freevfs(vfs_t *vfsp);
 
 struct vfsops zfs_vfsops = {
 	.vfs_mount =		zfs_mount,
 	.vfs_unmount =		zfs_umount,
 	.vfs_root =		zfs_root,
 	.vfs_statfs =		zfs_statfs,
 	.vfs_vget =		zfs_vget,
 	.vfs_sync =		zfs_sync,
 	.vfs_checkexp =		zfs_checkexp,
 	.vfs_fhtovp =		zfs_fhtovp,
 };
 
 VFS_SET(zfs_vfsops, zfs, VFCF_JAIL | VFCF_DELEGADMIN);
 
 /*
  * We need to keep a count of active fs's.
  * This is necessary to prevent our module
  * from being unloaded after a umount -f
  */
 static uint32_t	zfs_active_fs_count = 0;
 
 /*ARGSUSED*/
 static int
 zfs_sync(vfs_t *vfsp, int waitfor)
 {
 
 	/*
 	 * Data integrity is job one.  We don't want a compromised kernel
 	 * writing to the storage pool, so we never sync during panic.
 	 */
 	if (panicstr)
 		return (0);
 
 	/*
 	 * Ignore the system syncher.  ZFS already commits async data
 	 * at zfs_txg_timeout intervals.
 	 */
 	if (waitfor == MNT_LAZY)
 		return (0);
 
 	if (vfsp != NULL) {
 		/*
 		 * Sync a specific filesystem.
 		 */
 		zfsvfs_t *zfsvfs = vfsp->vfs_data;
 		dsl_pool_t *dp;
 		int error;
 
 		error = vfs_stdsync(vfsp, waitfor);
 		if (error != 0)
 			return (error);
 
 		ZFS_ENTER(zfsvfs);
 		dp = dmu_objset_pool(zfsvfs->z_os);
 
 		/*
 		 * If the system is shutting down, then skip any
 		 * filesystems which may exist on a suspended pool.
 		 */
 		if (sys_shutdown && spa_suspended(dp->dp_spa)) {
 			ZFS_EXIT(zfsvfs);
 			return (0);
 		}
 
 		if (zfsvfs->z_log != NULL)
 			zil_commit(zfsvfs->z_log, 0);
 
 		ZFS_EXIT(zfsvfs);
 	} else {
 		/*
 		 * Sync all ZFS filesystems.  This is what happens when you
 		 * run sync(1M).  Unlike other filesystems, ZFS honors the
 		 * request by waiting for all pools to commit all dirty data.
 		 */
 		spa_sync_allpools();
 	}
 
 	return (0);
 }
 
 #ifndef __FreeBSD_kernel__
 static int
 zfs_create_unique_device(dev_t *dev)
 {
 	major_t new_major;
 
 	do {
 		ASSERT3U(zfs_minor, <=, MAXMIN32);
 		minor_t start = zfs_minor;
 		do {
 			mutex_enter(&zfs_dev_mtx);
 			if (zfs_minor >= MAXMIN32) {
 				/*
 				 * If we're still using the real major
 				 * keep out of /dev/zfs and /dev/zvol minor
 				 * number space.  If we're using a getudev()'ed
 				 * major number, we can use all of its minors.
 				 */
 				if (zfs_major == ddi_name_to_major(ZFS_DRIVER))
 					zfs_minor = ZFS_MIN_MINOR;
 				else
 					zfs_minor = 0;
 			} else {
 				zfs_minor++;
 			}
 			*dev = makedevice(zfs_major, zfs_minor);
 			mutex_exit(&zfs_dev_mtx);
 		} while (vfs_devismounted(*dev) && zfs_minor != start);
 		if (zfs_minor == start) {
 			/*
 			 * We are using all ~262,000 minor numbers for the
 			 * current major number.  Create a new major number.
 			 */
 			if ((new_major = getudev()) == (major_t)-1) {
 				cmn_err(CE_WARN,
 				    "zfs_mount: Can't get unique major "
 				    "device number.");
 				return (-1);
 			}
 			mutex_enter(&zfs_dev_mtx);
 			zfs_major = new_major;
 			zfs_minor = 0;
 
 			mutex_exit(&zfs_dev_mtx);
 		} else {
 			break;
 		}
 		/* CONSTANTCONDITION */
 	} while (1);
 
 	return (0);
 }
 #endif	/* !__FreeBSD_kernel__ */
 
 static void
 atime_changed_cb(void *arg, uint64_t newval)
 {
 	zfsvfs_t *zfsvfs = arg;
 
 	if (newval == TRUE) {
 		zfsvfs->z_atime = TRUE;
 		zfsvfs->z_vfs->vfs_flag &= ~MNT_NOATIME;
 		vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOATIME);
 		vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_ATIME, NULL, 0);
 	} else {
 		zfsvfs->z_atime = FALSE;
 		zfsvfs->z_vfs->vfs_flag |= MNT_NOATIME;
 		vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_ATIME);
 		vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOATIME, NULL, 0);
 	}
 }
 
 static void
 xattr_changed_cb(void *arg, uint64_t newval)
 {
 	zfsvfs_t *zfsvfs = arg;
 
 	if (newval == TRUE) {
 		/* XXX locking on vfs_flag? */
 #ifdef TODO
 		zfsvfs->z_vfs->vfs_flag |= VFS_XATTR;
 #endif
 		vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOXATTR);
 		vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_XATTR, NULL, 0);
 	} else {
 		/* XXX locking on vfs_flag? */
 #ifdef TODO
 		zfsvfs->z_vfs->vfs_flag &= ~VFS_XATTR;
 #endif
 		vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_XATTR);
 		vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOXATTR, NULL, 0);
 	}
 }
 
 static void
 blksz_changed_cb(void *arg, uint64_t newval)
 {
 	zfsvfs_t *zfsvfs = arg;
 	ASSERT3U(newval, <=, spa_maxblocksize(dmu_objset_spa(zfsvfs->z_os)));
 	ASSERT3U(newval, >=, SPA_MINBLOCKSIZE);
 	ASSERT(ISP2(newval));
 
 	zfsvfs->z_max_blksz = newval;
 	zfsvfs->z_vfs->mnt_stat.f_iosize = newval;
 }
 
 static void
 readonly_changed_cb(void *arg, uint64_t newval)
 {
 	zfsvfs_t *zfsvfs = arg;
 
 	if (newval) {
 		/* XXX locking on vfs_flag? */
 		zfsvfs->z_vfs->vfs_flag |= VFS_RDONLY;
 		vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_RW);
 		vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_RO, NULL, 0);
 	} else {
 		/* XXX locking on vfs_flag? */
 		zfsvfs->z_vfs->vfs_flag &= ~VFS_RDONLY;
 		vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_RO);
 		vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_RW, NULL, 0);
 	}
 }
 
 static void
 setuid_changed_cb(void *arg, uint64_t newval)
 {
 	zfsvfs_t *zfsvfs = arg;
 
 	if (newval == FALSE) {
 		zfsvfs->z_vfs->vfs_flag |= VFS_NOSETUID;
 		vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_SETUID);
 		vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOSETUID, NULL, 0);
 	} else {
 		zfsvfs->z_vfs->vfs_flag &= ~VFS_NOSETUID;
 		vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOSETUID);
 		vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_SETUID, NULL, 0);
 	}
 }
 
 static void
 exec_changed_cb(void *arg, uint64_t newval)
 {
 	zfsvfs_t *zfsvfs = arg;
 
 	if (newval == FALSE) {
 		zfsvfs->z_vfs->vfs_flag |= VFS_NOEXEC;
 		vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_EXEC);
 		vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NOEXEC, NULL, 0);
 	} else {
 		zfsvfs->z_vfs->vfs_flag &= ~VFS_NOEXEC;
 		vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NOEXEC);
 		vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_EXEC, NULL, 0);
 	}
 }
 
 /*
  * The nbmand mount option can be changed at mount time.
  * We can't allow it to be toggled on live file systems or incorrect
  * behavior may be seen from cifs clients
  *
  * This property isn't registered via dsl_prop_register(), but this callback
  * will be called when a file system is first mounted
  */
 static void
 nbmand_changed_cb(void *arg, uint64_t newval)
 {
 	zfsvfs_t *zfsvfs = arg;
 	if (newval == FALSE) {
 		vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NBMAND);
 		vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NONBMAND, NULL, 0);
 	} else {
 		vfs_clearmntopt(zfsvfs->z_vfs, MNTOPT_NONBMAND);
 		vfs_setmntopt(zfsvfs->z_vfs, MNTOPT_NBMAND, NULL, 0);
 	}
 }
 
 static void
 snapdir_changed_cb(void *arg, uint64_t newval)
 {
 	zfsvfs_t *zfsvfs = arg;
 
 	zfsvfs->z_show_ctldir = newval;
 }
 
 static void
 vscan_changed_cb(void *arg, uint64_t newval)
 {
 	zfsvfs_t *zfsvfs = arg;
 
 	zfsvfs->z_vscan = newval;
 }
 
 static void
 acl_mode_changed_cb(void *arg, uint64_t newval)
 {
 	zfsvfs_t *zfsvfs = arg;
 
 	zfsvfs->z_acl_mode = newval;
 }
 
 static void
 acl_inherit_changed_cb(void *arg, uint64_t newval)
 {
 	zfsvfs_t *zfsvfs = arg;
 
 	zfsvfs->z_acl_inherit = newval;
 }
 
 static int
 zfs_register_callbacks(vfs_t *vfsp)
 {
 	struct dsl_dataset *ds = NULL;
 	objset_t *os = NULL;
 	zfsvfs_t *zfsvfs = NULL;
 	uint64_t nbmand;
 	boolean_t readonly = B_FALSE;
 	boolean_t do_readonly = B_FALSE;
 	boolean_t setuid = B_FALSE;
 	boolean_t do_setuid = B_FALSE;
 	boolean_t exec = B_FALSE;
 	boolean_t do_exec = B_FALSE;
 #ifdef illumos
 	boolean_t devices = B_FALSE;
 	boolean_t do_devices = B_FALSE;
 #endif
 	boolean_t xattr = B_FALSE;
 	boolean_t do_xattr = B_FALSE;
 	boolean_t atime = B_FALSE;
 	boolean_t do_atime = B_FALSE;
 	int error = 0;
 
 	ASSERT(vfsp);
 	zfsvfs = vfsp->vfs_data;
 	ASSERT(zfsvfs);
 	os = zfsvfs->z_os;
 
 	/*
 	 * This function can be called for a snapshot when we update snapshot's
 	 * mount point, which isn't really supported.
 	 */
 	if (dmu_objset_is_snapshot(os))
 		return (EOPNOTSUPP);
 
 	/*
 	 * The act of registering our callbacks will destroy any mount
 	 * options we may have.  In order to enable temporary overrides
 	 * of mount options, we stash away the current values and
 	 * restore them after we register the callbacks.
 	 */
 	if (vfs_optionisset(vfsp, MNTOPT_RO, NULL) ||
 	    !spa_writeable(dmu_objset_spa(os))) {
 		readonly = B_TRUE;
 		do_readonly = B_TRUE;
 	} else if (vfs_optionisset(vfsp, MNTOPT_RW, NULL)) {
 		readonly = B_FALSE;
 		do_readonly = B_TRUE;
 	}
 	if (vfs_optionisset(vfsp, MNTOPT_NOSUID, NULL)) {
 		setuid = B_FALSE;
 		do_setuid = B_TRUE;
 	} else {
 		if (vfs_optionisset(vfsp, MNTOPT_NOSETUID, NULL)) {
 			setuid = B_FALSE;
 			do_setuid = B_TRUE;
 		} else if (vfs_optionisset(vfsp, MNTOPT_SETUID, NULL)) {
 			setuid = B_TRUE;
 			do_setuid = B_TRUE;
 		}
 	}
 	if (vfs_optionisset(vfsp, MNTOPT_NOEXEC, NULL)) {
 		exec = B_FALSE;
 		do_exec = B_TRUE;
 	} else if (vfs_optionisset(vfsp, MNTOPT_EXEC, NULL)) {
 		exec = B_TRUE;
 		do_exec = B_TRUE;
 	}
 	if (vfs_optionisset(vfsp, MNTOPT_NOXATTR, NULL)) {
 		xattr = B_FALSE;
 		do_xattr = B_TRUE;
 	} else if (vfs_optionisset(vfsp, MNTOPT_XATTR, NULL)) {
 		xattr = B_TRUE;
 		do_xattr = B_TRUE;
 	}
 	if (vfs_optionisset(vfsp, MNTOPT_NOATIME, NULL)) {
 		atime = B_FALSE;
 		do_atime = B_TRUE;
 	} else if (vfs_optionisset(vfsp, MNTOPT_ATIME, NULL)) {
 		atime = B_TRUE;
 		do_atime = B_TRUE;
 	}
 
 	/*
 	 * We need to enter pool configuration here, so that we can use
 	 * dsl_prop_get_int_ds() to handle the special nbmand property below.
 	 * dsl_prop_get_integer() can not be used, because it has to acquire
 	 * spa_namespace_lock and we can not do that because we already hold
 	 * z_teardown_lock.  The problem is that spa_config_sync() is called
 	 * with spa_namespace_lock held and the function calls ZFS vnode
 	 * operations to write the cache file and thus z_teardown_lock is
 	 * acquired after spa_namespace_lock.
 	 */
 	ds = dmu_objset_ds(os);
 	dsl_pool_config_enter(dmu_objset_pool(os), FTAG);
 
 	/*
 	 * nbmand is a special property.  It can only be changed at
 	 * mount time.
 	 *
 	 * This is weird, but it is documented to only be changeable
 	 * at mount time.
 	 */
 	if (vfs_optionisset(vfsp, MNTOPT_NONBMAND, NULL)) {
 		nbmand = B_FALSE;
 	} else if (vfs_optionisset(vfsp, MNTOPT_NBMAND, NULL)) {
 		nbmand = B_TRUE;
 	} else if (error = dsl_prop_get_int_ds(ds, "nbmand", &nbmand) != 0) {
 		dsl_pool_config_exit(dmu_objset_pool(os), FTAG);
 		return (error);
 	}
 
 	/*
 	 * Register property callbacks.
 	 *
 	 * It would probably be fine to just check for i/o error from
 	 * the first prop_register(), but I guess I like to go
 	 * overboard...
 	 */
 	error = dsl_prop_register(ds,
 	    zfs_prop_to_name(ZFS_PROP_ATIME), atime_changed_cb, zfsvfs);
 	error = error ? error : dsl_prop_register(ds,
 	    zfs_prop_to_name(ZFS_PROP_XATTR), xattr_changed_cb, zfsvfs);
 	error = error ? error : dsl_prop_register(ds,
 	    zfs_prop_to_name(ZFS_PROP_RECORDSIZE), blksz_changed_cb, zfsvfs);
 	error = error ? error : dsl_prop_register(ds,
 	    zfs_prop_to_name(ZFS_PROP_READONLY), readonly_changed_cb, zfsvfs);
 #ifdef illumos
 	error = error ? error : dsl_prop_register(ds,
 	    zfs_prop_to_name(ZFS_PROP_DEVICES), devices_changed_cb, zfsvfs);
 #endif
 	error = error ? error : dsl_prop_register(ds,
 	    zfs_prop_to_name(ZFS_PROP_SETUID), setuid_changed_cb, zfsvfs);
 	error = error ? error : dsl_prop_register(ds,
 	    zfs_prop_to_name(ZFS_PROP_EXEC), exec_changed_cb, zfsvfs);
 	error = error ? error : dsl_prop_register(ds,
 	    zfs_prop_to_name(ZFS_PROP_SNAPDIR), snapdir_changed_cb, zfsvfs);
 	error = error ? error : dsl_prop_register(ds,
 	    zfs_prop_to_name(ZFS_PROP_ACLMODE), acl_mode_changed_cb, zfsvfs);
 	error = error ? error : dsl_prop_register(ds,
 	    zfs_prop_to_name(ZFS_PROP_ACLINHERIT), acl_inherit_changed_cb,
 	    zfsvfs);
 	error = error ? error : dsl_prop_register(ds,
 	    zfs_prop_to_name(ZFS_PROP_VSCAN), vscan_changed_cb, zfsvfs);
 	dsl_pool_config_exit(dmu_objset_pool(os), FTAG);
 	if (error)
 		goto unregister;
 
 	/*
 	 * Invoke our callbacks to restore temporary mount options.
 	 */
 	if (do_readonly)
 		readonly_changed_cb(zfsvfs, readonly);
 	if (do_setuid)
 		setuid_changed_cb(zfsvfs, setuid);
 	if (do_exec)
 		exec_changed_cb(zfsvfs, exec);
 	if (do_xattr)
 		xattr_changed_cb(zfsvfs, xattr);
 	if (do_atime)
 		atime_changed_cb(zfsvfs, atime);
 
 	nbmand_changed_cb(zfsvfs, nbmand);
 
 	return (0);
 
 unregister:
 	dsl_prop_unregister_all(ds, zfsvfs);
 	return (error);
 }
 
 static int
 zfs_space_delta_cb(dmu_object_type_t bonustype, void *data,
     uint64_t *userp, uint64_t *groupp)
 {
 	/*
 	 * Is it a valid type of object to track?
 	 */
 	if (bonustype != DMU_OT_ZNODE && bonustype != DMU_OT_SA)
 		return (SET_ERROR(ENOENT));
 
 	/*
 	 * If we have a NULL data pointer
 	 * then assume the id's aren't changing and
 	 * return EEXIST to the dmu to let it know to
 	 * use the same ids
 	 */
 	if (data == NULL)
 		return (SET_ERROR(EEXIST));
 
 	if (bonustype == DMU_OT_ZNODE) {
 		znode_phys_t *znp = data;
 		*userp = znp->zp_uid;
 		*groupp = znp->zp_gid;
 	} else {
 		int hdrsize;
 		sa_hdr_phys_t *sap = data;
 		sa_hdr_phys_t sa = *sap;
 		boolean_t swap = B_FALSE;
 
 		ASSERT(bonustype == DMU_OT_SA);
 
 		if (sa.sa_magic == 0) {
 			/*
 			 * This should only happen for newly created
 			 * files that haven't had the znode data filled
 			 * in yet.
 			 */
 			*userp = 0;
 			*groupp = 0;
 			return (0);
 		}
 		if (sa.sa_magic == BSWAP_32(SA_MAGIC)) {
 			sa.sa_magic = SA_MAGIC;
 			sa.sa_layout_info = BSWAP_16(sa.sa_layout_info);
 			swap = B_TRUE;
 		} else {
 			VERIFY3U(sa.sa_magic, ==, SA_MAGIC);
 		}
 
 		hdrsize = sa_hdrsize(&sa);
 		VERIFY3U(hdrsize, >=, sizeof (sa_hdr_phys_t));
 		*userp = *((uint64_t *)((uintptr_t)data + hdrsize +
 		    SA_UID_OFFSET));
 		*groupp = *((uint64_t *)((uintptr_t)data + hdrsize +
 		    SA_GID_OFFSET));
 		if (swap) {
 			*userp = BSWAP_64(*userp);
 			*groupp = BSWAP_64(*groupp);
 		}
 	}
 	return (0);
 }
 
 static void
 fuidstr_to_sid(zfsvfs_t *zfsvfs, const char *fuidstr,
     char *domainbuf, int buflen, uid_t *ridp)
 {
 	uint64_t fuid;
 	const char *domain;
 
 	fuid = strtonum(fuidstr, NULL);
 
 	domain = zfs_fuid_find_by_idx(zfsvfs, FUID_INDEX(fuid));
 	if (domain)
 		(void) strlcpy(domainbuf, domain, buflen);
 	else
 		domainbuf[0] = '\0';
 	*ridp = FUID_RID(fuid);
 }
 
 static uint64_t
 zfs_userquota_prop_to_obj(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type)
 {
 	switch (type) {
 	case ZFS_PROP_USERUSED:
 		return (DMU_USERUSED_OBJECT);
 	case ZFS_PROP_GROUPUSED:
 		return (DMU_GROUPUSED_OBJECT);
 	case ZFS_PROP_USERQUOTA:
 		return (zfsvfs->z_userquota_obj);
 	case ZFS_PROP_GROUPQUOTA:
 		return (zfsvfs->z_groupquota_obj);
 	}
 	return (0);
 }
 
 int
 zfs_userspace_many(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
     uint64_t *cookiep, void *vbuf, uint64_t *bufsizep)
 {
 	int error;
 	zap_cursor_t zc;
 	zap_attribute_t za;
 	zfs_useracct_t *buf = vbuf;
 	uint64_t obj;
 
 	if (!dmu_objset_userspace_present(zfsvfs->z_os))
 		return (SET_ERROR(ENOTSUP));
 
 	obj = zfs_userquota_prop_to_obj(zfsvfs, type);
 	if (obj == 0) {
 		*bufsizep = 0;
 		return (0);
 	}
 
 	for (zap_cursor_init_serialized(&zc, zfsvfs->z_os, obj, *cookiep);
 	    (error = zap_cursor_retrieve(&zc, &za)) == 0;
 	    zap_cursor_advance(&zc)) {
 		if ((uintptr_t)buf - (uintptr_t)vbuf + sizeof (zfs_useracct_t) >
 		    *bufsizep)
 			break;
 
 		fuidstr_to_sid(zfsvfs, za.za_name,
 		    buf->zu_domain, sizeof (buf->zu_domain), &buf->zu_rid);
 
 		buf->zu_space = za.za_first_integer;
 		buf++;
 	}
 	if (error == ENOENT)
 		error = 0;
 
 	ASSERT3U((uintptr_t)buf - (uintptr_t)vbuf, <=, *bufsizep);
 	*bufsizep = (uintptr_t)buf - (uintptr_t)vbuf;
 	*cookiep = zap_cursor_serialize(&zc);
 	zap_cursor_fini(&zc);
 	return (error);
 }
 
 /*
  * buf must be big enough (eg, 32 bytes)
  */
 static int
 id_to_fuidstr(zfsvfs_t *zfsvfs, const char *domain, uid_t rid,
     char *buf, boolean_t addok)
 {
 	uint64_t fuid;
 	int domainid = 0;
 
 	if (domain && domain[0]) {
 		domainid = zfs_fuid_find_by_domain(zfsvfs, domain, NULL, addok);
 		if (domainid == -1)
 			return (SET_ERROR(ENOENT));
 	}
 	fuid = FUID_ENCODE(domainid, rid);
 	(void) sprintf(buf, "%llx", (longlong_t)fuid);
 	return (0);
 }
 
 int
 zfs_userspace_one(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
     const char *domain, uint64_t rid, uint64_t *valp)
 {
 	char buf[32];
 	int err;
 	uint64_t obj;
 
 	*valp = 0;
 
 	if (!dmu_objset_userspace_present(zfsvfs->z_os))
 		return (SET_ERROR(ENOTSUP));
 
 	obj = zfs_userquota_prop_to_obj(zfsvfs, type);
 	if (obj == 0)
 		return (0);
 
 	err = id_to_fuidstr(zfsvfs, domain, rid, buf, B_FALSE);
 	if (err)
 		return (err);
 
 	err = zap_lookup(zfsvfs->z_os, obj, buf, 8, 1, valp);
 	if (err == ENOENT)
 		err = 0;
 	return (err);
 }
 
 int
 zfs_set_userquota(zfsvfs_t *zfsvfs, zfs_userquota_prop_t type,
     const char *domain, uint64_t rid, uint64_t quota)
 {
 	char buf[32];
 	int err;
 	dmu_tx_t *tx;
 	uint64_t *objp;
 	boolean_t fuid_dirtied;
 
 	if (type != ZFS_PROP_USERQUOTA && type != ZFS_PROP_GROUPQUOTA)
 		return (SET_ERROR(EINVAL));
 
 	if (zfsvfs->z_version < ZPL_VERSION_USERSPACE)
 		return (SET_ERROR(ENOTSUP));
 
 	objp = (type == ZFS_PROP_USERQUOTA) ? &zfsvfs->z_userquota_obj :
 	    &zfsvfs->z_groupquota_obj;
 
 	err = id_to_fuidstr(zfsvfs, domain, rid, buf, B_TRUE);
 	if (err)
 		return (err);
 	fuid_dirtied = zfsvfs->z_fuid_dirty;
 
 	tx = dmu_tx_create(zfsvfs->z_os);
 	dmu_tx_hold_zap(tx, *objp ? *objp : DMU_NEW_OBJECT, B_TRUE, NULL);
 	if (*objp == 0) {
 		dmu_tx_hold_zap(tx, MASTER_NODE_OBJ, B_TRUE,
 		    zfs_userquota_prop_prefixes[type]);
 	}
 	if (fuid_dirtied)
 		zfs_fuid_txhold(zfsvfs, tx);
 	err = dmu_tx_assign(tx, TXG_WAIT);
 	if (err) {
 		dmu_tx_abort(tx);
 		return (err);
 	}
 
 	mutex_enter(&zfsvfs->z_lock);
 	if (*objp == 0) {
 		*objp = zap_create(zfsvfs->z_os, DMU_OT_USERGROUP_QUOTA,
 		    DMU_OT_NONE, 0, tx);
 		VERIFY(0 == zap_add(zfsvfs->z_os, MASTER_NODE_OBJ,
 		    zfs_userquota_prop_prefixes[type], 8, 1, objp, tx));
 	}
 	mutex_exit(&zfsvfs->z_lock);
 
 	if (quota == 0) {
 		err = zap_remove(zfsvfs->z_os, *objp, buf, tx);
 		if (err == ENOENT)
 			err = 0;
 	} else {
 		err = zap_update(zfsvfs->z_os, *objp, buf, 8, 1, &quota, tx);
 	}
 	ASSERT(err == 0);
 	if (fuid_dirtied)
 		zfs_fuid_sync(zfsvfs, tx);
 	dmu_tx_commit(tx);
 	return (err);
 }
 
 boolean_t
 zfs_fuid_overquota(zfsvfs_t *zfsvfs, boolean_t isgroup, uint64_t fuid)
 {
 	char buf[32];
 	uint64_t used, quota, usedobj, quotaobj;
 	int err;
 
 	usedobj = isgroup ? DMU_GROUPUSED_OBJECT : DMU_USERUSED_OBJECT;
 	quotaobj = isgroup ? zfsvfs->z_groupquota_obj : zfsvfs->z_userquota_obj;
 
 	if (quotaobj == 0 || zfsvfs->z_replay)
 		return (B_FALSE);
 
 	(void) sprintf(buf, "%llx", (longlong_t)fuid);
 	err = zap_lookup(zfsvfs->z_os, quotaobj, buf, 8, 1, &quota);
 	if (err != 0)
 		return (B_FALSE);
 
 	err = zap_lookup(zfsvfs->z_os, usedobj, buf, 8, 1, &used);
 	if (err != 0)
 		return (B_FALSE);
 	return (used >= quota);
 }
 
 boolean_t
 zfs_owner_overquota(zfsvfs_t *zfsvfs, znode_t *zp, boolean_t isgroup)
 {
 	uint64_t fuid;
 	uint64_t quotaobj;
 
 	quotaobj = isgroup ? zfsvfs->z_groupquota_obj : zfsvfs->z_userquota_obj;
 
 	fuid = isgroup ? zp->z_gid : zp->z_uid;
 
 	if (quotaobj == 0 || zfsvfs->z_replay)
 		return (B_FALSE);
 
 	return (zfs_fuid_overquota(zfsvfs, isgroup, fuid));
 }
 
 /*
  * Associate this zfsvfs with the given objset, which must be owned.
  * This will cache a bunch of on-disk state from the objset in the
  * zfsvfs.
  */
 static int
 zfsvfs_init(zfsvfs_t *zfsvfs, objset_t *os)
 {
 	int error;
 	uint64_t val;
 
 	zfsvfs->z_max_blksz = SPA_OLD_MAXBLOCKSIZE;
 	zfsvfs->z_show_ctldir = ZFS_SNAPDIR_VISIBLE;
 	zfsvfs->z_os = os;
 
 	error = zfs_get_zplprop(os, ZFS_PROP_VERSION, &zfsvfs->z_version);
 	if (error != 0)
 		return (error);
 	if (zfsvfs->z_version >
 	    zfs_zpl_version_map(spa_version(dmu_objset_spa(os)))) {
 		(void) printf("Can't mount a version %lld file system "
 		    "on a version %lld pool\n. Pool must be upgraded to mount "
 		    "this file system.", (u_longlong_t)zfsvfs->z_version,
 		    (u_longlong_t)spa_version(dmu_objset_spa(os)));
 		return (SET_ERROR(ENOTSUP));
 	}
 	error = zfs_get_zplprop(os, ZFS_PROP_NORMALIZE, &val);
 	if (error != 0)
 		return (error);
 	zfsvfs->z_norm = (int)val;
 
 	error = zfs_get_zplprop(os, ZFS_PROP_UTF8ONLY, &val);
 	if (error != 0)
 		return (error);
 	zfsvfs->z_utf8 = (val != 0);
 
 	error = zfs_get_zplprop(os, ZFS_PROP_CASE, &val);
 	if (error != 0)
 		return (error);
 	zfsvfs->z_case = (uint_t)val;
 
 	/*
 	 * Fold case on file systems that are always or sometimes case
 	 * insensitive.
 	 */
 	if (zfsvfs->z_case == ZFS_CASE_INSENSITIVE ||
 	    zfsvfs->z_case == ZFS_CASE_MIXED)
 		zfsvfs->z_norm |= U8_TEXTPREP_TOUPPER;
 
 	zfsvfs->z_use_fuids = USE_FUIDS(zfsvfs->z_version, zfsvfs->z_os);
 	zfsvfs->z_use_sa = USE_SA(zfsvfs->z_version, zfsvfs->z_os);
 
 	uint64_t sa_obj = 0;
 	if (zfsvfs->z_use_sa) {
 		/* should either have both of these objects or none */
 		error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_SA_ATTRS, 8, 1,
 		    &sa_obj);
 		if (error != 0)
 			return (error);
 	}
 
 	error = sa_setup(os, sa_obj, zfs_attr_table, ZPL_END,
 	    &zfsvfs->z_attr_table);
 	if (error != 0)
 		return (error);
 
 	if (zfsvfs->z_version >= ZPL_VERSION_SA)
 		sa_register_update_callback(os, zfs_sa_upgrade);
 
 	error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_ROOT_OBJ, 8, 1,
 	    &zfsvfs->z_root);
 	if (error != 0)
 		return (error);
 	ASSERT(zfsvfs->z_root != 0);
 
 	error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_UNLINKED_SET, 8, 1,
 	    &zfsvfs->z_unlinkedobj);
 	if (error != 0)
 		return (error);
 
 	error = zap_lookup(os, MASTER_NODE_OBJ,
 	    zfs_userquota_prop_prefixes[ZFS_PROP_USERQUOTA],
 	    8, 1, &zfsvfs->z_userquota_obj);
 	if (error == ENOENT)
 		zfsvfs->z_userquota_obj = 0;
 	else if (error != 0)
 		return (error);
 
 	error = zap_lookup(os, MASTER_NODE_OBJ,
 	    zfs_userquota_prop_prefixes[ZFS_PROP_GROUPQUOTA],
 	    8, 1, &zfsvfs->z_groupquota_obj);
 	if (error == ENOENT)
 		zfsvfs->z_groupquota_obj = 0;
 	else if (error != 0)
 		return (error);
 
 	error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_FUID_TABLES, 8, 1,
 	    &zfsvfs->z_fuid_obj);
 	if (error == ENOENT)
 		zfsvfs->z_fuid_obj = 0;
 	else if (error != 0)
 		return (error);
 
 	error = zap_lookup(os, MASTER_NODE_OBJ, ZFS_SHARES_DIR, 8, 1,
 	    &zfsvfs->z_shares_dir);
 	if (error == ENOENT)
 		zfsvfs->z_shares_dir = 0;
 	else if (error != 0)
 		return (error);
 
+	/*
+	 * Only use the name cache if we are looking for a
+	 * name on a file system that does not require normalization
+	 * or case folding.  We can also look there if we happen to be
+	 * on a non-normalizing, mixed sensitivity file system IF we
+	 * are looking for the exact name (which is always the case on
+	 * FreeBSD).
+	 */
+	zfsvfs->z_use_namecache = !zfsvfs->z_norm ||
+	    ((zfsvfs->z_case == ZFS_CASE_MIXED) &&
+	    !(zfsvfs->z_norm & ~U8_TEXTPREP_TOUPPER));
+
 	return (0);
 }
 
 int
 zfsvfs_create(const char *osname, zfsvfs_t **zfvp)
 {
 	objset_t *os;
 	zfsvfs_t *zfsvfs;
 	int error;
 
 	/*
 	 * XXX: Fix struct statfs so this isn't necessary!
 	 *
 	 * The 'osname' is used as the filesystem's special node, which means
 	 * it must fit in statfs.f_mntfromname, or else it can't be
 	 * enumerated, so libzfs_mnttab_find() returns NULL, which causes
 	 * 'zfs unmount' to think it's not mounted when it is.
 	 */
 	if (strlen(osname) >= MNAMELEN)
 		return (SET_ERROR(ENAMETOOLONG));
 
 	zfsvfs = kmem_zalloc(sizeof (zfsvfs_t), KM_SLEEP);
 
 	/*
 	 * We claim to always be readonly so we can open snapshots;
 	 * other ZPL code will prevent us from writing to snapshots.
 	 */
 	error = dmu_objset_own(osname, DMU_OST_ZFS, B_TRUE, zfsvfs, &os);
 	if (error) {
 		kmem_free(zfsvfs, sizeof (zfsvfs_t));
 		return (error);
 	}
 
 	zfsvfs->z_vfs = NULL;
 	zfsvfs->z_parent = zfsvfs;
 
 	mutex_init(&zfsvfs->z_znodes_lock, NULL, MUTEX_DEFAULT, NULL);
 	mutex_init(&zfsvfs->z_lock, NULL, MUTEX_DEFAULT, NULL);
 	list_create(&zfsvfs->z_all_znodes, sizeof (znode_t),
 	    offsetof(znode_t, z_link_node));
+#ifdef DIAGNOSTIC
+	rrm_init(&zfsvfs->z_teardown_lock, B_TRUE);
+#else
 	rrm_init(&zfsvfs->z_teardown_lock, B_FALSE);
+#endif
 	rw_init(&zfsvfs->z_teardown_inactive_lock, NULL, RW_DEFAULT, NULL);
 	rw_init(&zfsvfs->z_fuid_lock, NULL, RW_DEFAULT, NULL);
 	for (int i = 0; i != ZFS_OBJ_MTX_SZ; i++)
 		mutex_init(&zfsvfs->z_hold_mtx[i], NULL, MUTEX_DEFAULT, NULL);
 
 	error = zfsvfs_init(zfsvfs, os);
 	if (error != 0) {
 		dmu_objset_disown(os, zfsvfs);
 		*zfvp = NULL;
 		kmem_free(zfsvfs, sizeof (zfsvfs_t));
 		return (error);
 	}
 
 	*zfvp = zfsvfs;
 	return (0);
 }
 
 static int
 zfsvfs_setup(zfsvfs_t *zfsvfs, boolean_t mounting)
 {
 	int error;
 
 	error = zfs_register_callbacks(zfsvfs->z_vfs);
 	if (error)
 		return (error);
 
 	/*
 	 * Set the objset user_ptr to track its zfsvfs.
 	 */
 	mutex_enter(&zfsvfs->z_os->os_user_ptr_lock);
 	dmu_objset_set_user(zfsvfs->z_os, zfsvfs);
 	mutex_exit(&zfsvfs->z_os->os_user_ptr_lock);
 
 	zfsvfs->z_log = zil_open(zfsvfs->z_os, zfs_get_data);
 
 	/*
 	 * If we are not mounting (ie: online recv), then we don't
 	 * have to worry about replaying the log as we blocked all
 	 * operations out since we closed the ZIL.
 	 */
 	if (mounting) {
 		boolean_t readonly;
 
 		/*
 		 * During replay we remove the read only flag to
 		 * allow replays to succeed.
 		 */
 		readonly = zfsvfs->z_vfs->vfs_flag & VFS_RDONLY;
 		if (readonly != 0)
 			zfsvfs->z_vfs->vfs_flag &= ~VFS_RDONLY;
 		else
 			zfs_unlinked_drain(zfsvfs);
 
 		/*
 		 * Parse and replay the intent log.
 		 *
 		 * Because of ziltest, this must be done after
 		 * zfs_unlinked_drain().  (Further note: ziltest
 		 * doesn't use readonly mounts, where
 		 * zfs_unlinked_drain() isn't called.)  This is because
 		 * ziltest causes spa_sync() to think it's committed,
 		 * but actually it is not, so the intent log contains
 		 * many txg's worth of changes.
 		 *
 		 * In particular, if object N is in the unlinked set in
 		 * the last txg to actually sync, then it could be
 		 * actually freed in a later txg and then reallocated
 		 * in a yet later txg.  This would write a "create
 		 * object N" record to the intent log.  Normally, this
 		 * would be fine because the spa_sync() would have
 		 * written out the fact that object N is free, before
 		 * we could write the "create object N" intent log
 		 * record.
 		 *
 		 * But when we are in ziltest mode, we advance the "open
 		 * txg" without actually spa_sync()-ing the changes to
 		 * disk.  So we would see that object N is still
 		 * allocated and in the unlinked set, and there is an
 		 * intent log record saying to allocate it.
 		 */
 		if (spa_writeable(dmu_objset_spa(zfsvfs->z_os))) {
 			if (zil_replay_disable) {
 				zil_destroy(zfsvfs->z_log, B_FALSE);
 			} else {
 				zfsvfs->z_replay = B_TRUE;
 				zil_replay(zfsvfs->z_os, zfsvfs,
 				    zfs_replay_vector);
 				zfsvfs->z_replay = B_FALSE;
 			}
 		}
 		zfsvfs->z_vfs->vfs_flag |= readonly; /* restore readonly bit */
 	}
 
 	return (0);
 }
 
 extern krwlock_t zfsvfs_lock; /* in zfs_znode.c */
 
 void
 zfsvfs_free(zfsvfs_t *zfsvfs)
 {
 	int i;
 
 	/*
 	 * This is a barrier to prevent the filesystem from going away in
 	 * zfs_znode_move() until we can safely ensure that the filesystem is
 	 * not unmounted. We consider the filesystem valid before the barrier
 	 * and invalid after the barrier.
 	 */
 	rw_enter(&zfsvfs_lock, RW_READER);
 	rw_exit(&zfsvfs_lock);
 
 	zfs_fuid_destroy(zfsvfs);
 
 	mutex_destroy(&zfsvfs->z_znodes_lock);
 	mutex_destroy(&zfsvfs->z_lock);
 	list_destroy(&zfsvfs->z_all_znodes);
 	rrm_destroy(&zfsvfs->z_teardown_lock);
 	rw_destroy(&zfsvfs->z_teardown_inactive_lock);
 	rw_destroy(&zfsvfs->z_fuid_lock);
 	for (i = 0; i != ZFS_OBJ_MTX_SZ; i++)
 		mutex_destroy(&zfsvfs->z_hold_mtx[i]);
 	kmem_free(zfsvfs, sizeof (zfsvfs_t));
 }
 
 static void
 zfs_set_fuid_feature(zfsvfs_t *zfsvfs)
 {
 	zfsvfs->z_use_fuids = USE_FUIDS(zfsvfs->z_version, zfsvfs->z_os);
 	if (zfsvfs->z_vfs) {
 		if (zfsvfs->z_use_fuids) {
 			vfs_set_feature(zfsvfs->z_vfs, VFSFT_XVATTR);
 			vfs_set_feature(zfsvfs->z_vfs, VFSFT_SYSATTR_VIEWS);
 			vfs_set_feature(zfsvfs->z_vfs, VFSFT_ACEMASKONACCESS);
 			vfs_set_feature(zfsvfs->z_vfs, VFSFT_ACLONCREATE);
 			vfs_set_feature(zfsvfs->z_vfs, VFSFT_ACCESS_FILTER);
 			vfs_set_feature(zfsvfs->z_vfs, VFSFT_REPARSE);
 		} else {
 			vfs_clear_feature(zfsvfs->z_vfs, VFSFT_XVATTR);
 			vfs_clear_feature(zfsvfs->z_vfs, VFSFT_SYSATTR_VIEWS);
 			vfs_clear_feature(zfsvfs->z_vfs, VFSFT_ACEMASKONACCESS);
 			vfs_clear_feature(zfsvfs->z_vfs, VFSFT_ACLONCREATE);
 			vfs_clear_feature(zfsvfs->z_vfs, VFSFT_ACCESS_FILTER);
 			vfs_clear_feature(zfsvfs->z_vfs, VFSFT_REPARSE);
 		}
 	}
 	zfsvfs->z_use_sa = USE_SA(zfsvfs->z_version, zfsvfs->z_os);
 }
 
 static int
 zfs_domount(vfs_t *vfsp, char *osname)
 {
 	uint64_t recordsize, fsid_guid;
 	int error = 0;
 	zfsvfs_t *zfsvfs;
 	vnode_t *vp;
 
 	ASSERT(vfsp);
 	ASSERT(osname);
 
 	error = zfsvfs_create(osname, &zfsvfs);
 	if (error)
 		return (error);
 	zfsvfs->z_vfs = vfsp;
 
 #ifdef illumos
 	/* Initialize the generic filesystem structure. */
 	vfsp->vfs_bcount = 0;
 	vfsp->vfs_data = NULL;
 
 	if (zfs_create_unique_device(&mount_dev) == -1) {
 		error = SET_ERROR(ENODEV);
 		goto out;
 	}
 	ASSERT(vfs_devismounted(mount_dev) == 0);
 #endif
 
 	if (error = dsl_prop_get_integer(osname, "recordsize", &recordsize,
 	    NULL))
 		goto out;
 	zfsvfs->z_vfs->vfs_bsize = SPA_MINBLOCKSIZE;
 	zfsvfs->z_vfs->mnt_stat.f_iosize = recordsize;
 
 	vfsp->vfs_data = zfsvfs;
 	vfsp->mnt_flag |= MNT_LOCAL;
 	vfsp->mnt_kern_flag |= MNTK_LOOKUP_SHARED;
 	vfsp->mnt_kern_flag |= MNTK_SHARED_WRITES;
 	vfsp->mnt_kern_flag |= MNTK_EXTENDED_SHARED;
 	vfsp->mnt_kern_flag |= MNTK_NO_IOPF;	/* vn_io_fault can be used */
 
 	/*
 	 * The fsid is 64 bits, composed of an 8-bit fs type, which
 	 * separates our fsid from any other filesystem types, and a
 	 * 56-bit objset unique ID.  The objset unique ID is unique to
 	 * all objsets open on this system, provided by unique_create().
 	 * The 8-bit fs type must be put in the low bits of fsid[1]
 	 * because that's where other Solaris filesystems put it.
 	 */
 	fsid_guid = dmu_objset_fsid_guid(zfsvfs->z_os);
 	ASSERT((fsid_guid & ~((1ULL<<56)-1)) == 0);
 	vfsp->vfs_fsid.val[0] = fsid_guid;
 	vfsp->vfs_fsid.val[1] = ((fsid_guid>>32) << 8) |
 	    vfsp->mnt_vfc->vfc_typenum & 0xFF;
 
 	/*
 	 * Set features for file system.
 	 */
 	zfs_set_fuid_feature(zfsvfs);
 	if (zfsvfs->z_case == ZFS_CASE_INSENSITIVE) {
 		vfs_set_feature(vfsp, VFSFT_DIRENTFLAGS);
 		vfs_set_feature(vfsp, VFSFT_CASEINSENSITIVE);
 		vfs_set_feature(vfsp, VFSFT_NOCASESENSITIVE);
 	} else if (zfsvfs->z_case == ZFS_CASE_MIXED) {
 		vfs_set_feature(vfsp, VFSFT_DIRENTFLAGS);
 		vfs_set_feature(vfsp, VFSFT_CASEINSENSITIVE);
 	}
 	vfs_set_feature(vfsp, VFSFT_ZEROCOPY_SUPPORTED);
 
 	if (dmu_objset_is_snapshot(zfsvfs->z_os)) {
 		uint64_t pval;
 
 		atime_changed_cb(zfsvfs, B_FALSE);
 		readonly_changed_cb(zfsvfs, B_TRUE);
 		if (error = dsl_prop_get_integer(osname, "xattr", &pval, NULL))
 			goto out;
 		xattr_changed_cb(zfsvfs, pval);
 		zfsvfs->z_issnap = B_TRUE;
 		zfsvfs->z_os->os_sync = ZFS_SYNC_DISABLED;
 
 		mutex_enter(&zfsvfs->z_os->os_user_ptr_lock);
 		dmu_objset_set_user(zfsvfs->z_os, zfsvfs);
 		mutex_exit(&zfsvfs->z_os->os_user_ptr_lock);
 	} else {
 		error = zfsvfs_setup(zfsvfs, B_TRUE);
 	}
 
 	vfs_mountedfrom(vfsp, osname);
 
 	if (!zfsvfs->z_issnap)
 		zfsctl_create(zfsvfs);
 out:
 	if (error) {
 		dmu_objset_disown(zfsvfs->z_os, zfsvfs);
 		zfsvfs_free(zfsvfs);
 	} else {
 		atomic_inc_32(&zfs_active_fs_count);
 	}
 
 	return (error);
 }
 
 void
 zfs_unregister_callbacks(zfsvfs_t *zfsvfs)
 {
 	objset_t *os = zfsvfs->z_os;
 
 	if (!dmu_objset_is_snapshot(os))
 		dsl_prop_unregister_all(dmu_objset_ds(os), zfsvfs);
 }
 
 #ifdef SECLABEL
 /*
  * Convert a decimal digit string to a uint64_t integer.
  */
 static int
 str_to_uint64(char *str, uint64_t *objnum)
 {
 	uint64_t num = 0;
 
 	while (*str) {
 		if (*str < '0' || *str > '9')
 			return (SET_ERROR(EINVAL));
 
 		num = num*10 + *str++ - '0';
 	}
 
 	*objnum = num;
 	return (0);
 }
 
 /*
  * The boot path passed from the boot loader is in the form of
  * "rootpool-name/root-filesystem-object-number'. Convert this
  * string to a dataset name: "rootpool-name/root-filesystem-name".
  */
 static int
 zfs_parse_bootfs(char *bpath, char *outpath)
 {
 	char *slashp;
 	uint64_t objnum;
 	int error;
 
 	if (*bpath == 0 || *bpath == '/')
 		return (SET_ERROR(EINVAL));
 
 	(void) strcpy(outpath, bpath);
 
 	slashp = strchr(bpath, '/');
 
 	/* if no '/', just return the pool name */
 	if (slashp == NULL) {
 		return (0);
 	}
 
 	/* if not a number, just return the root dataset name */
 	if (str_to_uint64(slashp+1, &objnum)) {
 		return (0);
 	}
 
 	*slashp = '\0';
 	error = dsl_dsobj_to_dsname(bpath, objnum, outpath);
 	*slashp = '/';
 
 	return (error);
 }
 
 /*
  * Check that the hex label string is appropriate for the dataset being
  * mounted into the global_zone proper.
  *
  * Return an error if the hex label string is not default or
  * admin_low/admin_high.  For admin_low labels, the corresponding
  * dataset must be readonly.
  */
 int
 zfs_check_global_label(const char *dsname, const char *hexsl)
 {
 	if (strcasecmp(hexsl, ZFS_MLSLABEL_DEFAULT) == 0)
 		return (0);
 	if (strcasecmp(hexsl, ADMIN_HIGH) == 0)
 		return (0);
 	if (strcasecmp(hexsl, ADMIN_LOW) == 0) {
 		/* must be readonly */
 		uint64_t rdonly;
 
 		if (dsl_prop_get_integer(dsname,
 		    zfs_prop_to_name(ZFS_PROP_READONLY), &rdonly, NULL))
 			return (SET_ERROR(EACCES));
 		return (rdonly ? 0 : EACCES);
 	}
 	return (SET_ERROR(EACCES));
 }
 
 /*
  * Determine whether the mount is allowed according to MAC check.
  * by comparing (where appropriate) label of the dataset against
  * the label of the zone being mounted into.  If the dataset has
  * no label, create one.
  *
  * Returns 0 if access allowed, error otherwise (e.g. EACCES)
  */
 static int
 zfs_mount_label_policy(vfs_t *vfsp, char *osname)
 {
 	int		error, retv;
 	zone_t		*mntzone = NULL;
 	ts_label_t	*mnt_tsl;
 	bslabel_t	*mnt_sl;
 	bslabel_t	ds_sl;
 	char		ds_hexsl[MAXNAMELEN];
 
 	retv = EACCES;				/* assume the worst */
 
 	/*
 	 * Start by getting the dataset label if it exists.
 	 */
 	error = dsl_prop_get(osname, zfs_prop_to_name(ZFS_PROP_MLSLABEL),
 	    1, sizeof (ds_hexsl), &ds_hexsl, NULL);
 	if (error)
 		return (SET_ERROR(EACCES));
 
 	/*
 	 * If labeling is NOT enabled, then disallow the mount of datasets
 	 * which have a non-default label already.  No other label checks
 	 * are needed.
 	 */
 	if (!is_system_labeled()) {
 		if (strcasecmp(ds_hexsl, ZFS_MLSLABEL_DEFAULT) == 0)
 			return (0);
 		return (SET_ERROR(EACCES));
 	}
 
 	/*
 	 * Get the label of the mountpoint.  If mounting into the global
 	 * zone (i.e. mountpoint is not within an active zone and the
 	 * zoned property is off), the label must be default or
 	 * admin_low/admin_high only; no other checks are needed.
 	 */
 	mntzone = zone_find_by_any_path(refstr_value(vfsp->vfs_mntpt), B_FALSE);
 	if (mntzone->zone_id == GLOBAL_ZONEID) {
 		uint64_t zoned;
 
 		zone_rele(mntzone);
 
 		if (dsl_prop_get_integer(osname,
 		    zfs_prop_to_name(ZFS_PROP_ZONED), &zoned, NULL))
 			return (SET_ERROR(EACCES));
 		if (!zoned)
 			return (zfs_check_global_label(osname, ds_hexsl));
 		else
 			/*
 			 * This is the case of a zone dataset being mounted
 			 * initially, before the zone has been fully created;
 			 * allow this mount into global zone.
 			 */
 			return (0);
 	}
 
 	mnt_tsl = mntzone->zone_slabel;
 	ASSERT(mnt_tsl != NULL);
 	label_hold(mnt_tsl);
 	mnt_sl = label2bslabel(mnt_tsl);
 
 	if (strcasecmp(ds_hexsl, ZFS_MLSLABEL_DEFAULT) == 0) {
 		/*
 		 * The dataset doesn't have a real label, so fabricate one.
 		 */
 		char *str = NULL;
 
 		if (l_to_str_internal(mnt_sl, &str) == 0 &&
 		    dsl_prop_set_string(osname,
 		    zfs_prop_to_name(ZFS_PROP_MLSLABEL),
 		    ZPROP_SRC_LOCAL, str) == 0)
 			retv = 0;
 		if (str != NULL)
 			kmem_free(str, strlen(str) + 1);
 	} else if (hexstr_to_label(ds_hexsl, &ds_sl) == 0) {
 		/*
 		 * Now compare labels to complete the MAC check.  If the
 		 * labels are equal then allow access.  If the mountpoint
 		 * label dominates the dataset label, allow readonly access.
 		 * Otherwise, access is denied.
 		 */
 		if (blequal(mnt_sl, &ds_sl))
 			retv = 0;
 		else if (bldominates(mnt_sl, &ds_sl)) {
 			vfs_setmntopt(vfsp, MNTOPT_RO, NULL, 0);
 			retv = 0;
 		}
 	}
 
 	label_rele(mnt_tsl);
 	zone_rele(mntzone);
 	return (retv);
 }
 #endif	/* SECLABEL */
 
 #ifdef OPENSOLARIS_MOUNTROOT
 static int
 zfs_mountroot(vfs_t *vfsp, enum whymountroot why)
 {
 	int error = 0;
 	static int zfsrootdone = 0;
 	zfsvfs_t *zfsvfs = NULL;
 	znode_t *zp = NULL;
 	vnode_t *vp = NULL;
 	char *zfs_bootfs;
 	char *zfs_devid;
 
 	ASSERT(vfsp);
 
 	/*
 	 * The filesystem that we mount as root is defined in the
 	 * boot property "zfs-bootfs" with a format of
 	 * "poolname/root-dataset-objnum".
 	 */
 	if (why == ROOT_INIT) {
 		if (zfsrootdone++)
 			return (SET_ERROR(EBUSY));
 		/*
 		 * the process of doing a spa_load will require the
 		 * clock to be set before we could (for example) do
 		 * something better by looking at the timestamp on
 		 * an uberblock, so just set it to -1.
 		 */
 		clkset(-1);
 
 		if ((zfs_bootfs = spa_get_bootprop("zfs-bootfs")) == NULL) {
 			cmn_err(CE_NOTE, "spa_get_bootfs: can not get "
 			    "bootfs name");
 			return (SET_ERROR(EINVAL));
 		}
 		zfs_devid = spa_get_bootprop("diskdevid");
 		error = spa_import_rootpool(rootfs.bo_name, zfs_devid);
 		if (zfs_devid)
 			spa_free_bootprop(zfs_devid);
 		if (error) {
 			spa_free_bootprop(zfs_bootfs);
 			cmn_err(CE_NOTE, "spa_import_rootpool: error %d",
 			    error);
 			return (error);
 		}
 		if (error = zfs_parse_bootfs(zfs_bootfs, rootfs.bo_name)) {
 			spa_free_bootprop(zfs_bootfs);
 			cmn_err(CE_NOTE, "zfs_parse_bootfs: error %d",
 			    error);
 			return (error);
 		}
 
 		spa_free_bootprop(zfs_bootfs);
 
 		if (error = vfs_lock(vfsp))
 			return (error);
 
 		if (error = zfs_domount(vfsp, rootfs.bo_name)) {
 			cmn_err(CE_NOTE, "zfs_domount: error %d", error);
 			goto out;
 		}
 
 		zfsvfs = (zfsvfs_t *)vfsp->vfs_data;
 		ASSERT(zfsvfs);
 		if (error = zfs_zget(zfsvfs, zfsvfs->z_root, &zp)) {
 			cmn_err(CE_NOTE, "zfs_zget: error %d", error);
 			goto out;
 		}
 
 		vp = ZTOV(zp);
 		mutex_enter(&vp->v_lock);
 		vp->v_flag |= VROOT;
 		mutex_exit(&vp->v_lock);
 		rootvp = vp;
 
 		/*
 		 * Leave rootvp held.  The root file system is never unmounted.
 		 */
 
 		vfs_add((struct vnode *)0, vfsp,
 		    (vfsp->vfs_flag & VFS_RDONLY) ? MS_RDONLY : 0);
 out:
 		vfs_unlock(vfsp);
 		return (error);
 	} else if (why == ROOT_REMOUNT) {
 		readonly_changed_cb(vfsp->vfs_data, B_FALSE);
 		vfsp->vfs_flag |= VFS_REMOUNT;
 
 		/* refresh mount options */
 		zfs_unregister_callbacks(vfsp->vfs_data);
 		return (zfs_register_callbacks(vfsp));
 
 	} else if (why == ROOT_UNMOUNT) {
 		zfs_unregister_callbacks((zfsvfs_t *)vfsp->vfs_data);
 		(void) zfs_sync(vfsp, 0, 0);
 		return (0);
 	}
 
 	/*
 	 * if "why" is equal to anything else other than ROOT_INIT,
 	 * ROOT_REMOUNT, or ROOT_UNMOUNT, we do not support it.
 	 */
 	return (SET_ERROR(ENOTSUP));
 }
 #endif	/* OPENSOLARIS_MOUNTROOT */
 
 static int
 getpoolname(const char *osname, char *poolname)
 {
 	char *p;
 
 	p = strchr(osname, '/');
 	if (p == NULL) {
 		if (strlen(osname) >= MAXNAMELEN)
 			return (ENAMETOOLONG);
 		(void) strcpy(poolname, osname);
 	} else {
 		if (p - osname >= MAXNAMELEN)
 			return (ENAMETOOLONG);
 		(void) strncpy(poolname, osname, p - osname);
 		poolname[p - osname] = '\0';
 	}
 	return (0);
 }
 
 /*ARGSUSED*/
 static int
 zfs_mount(vfs_t *vfsp)
 {
 	kthread_t	*td = curthread;
 	vnode_t		*mvp = vfsp->mnt_vnodecovered;
 	cred_t		*cr = td->td_ucred;
 	char		*osname;
 	int		error = 0;
 	int		canwrite;
 
 #ifdef illumos
 	if (mvp->v_type != VDIR)
 		return (SET_ERROR(ENOTDIR));
 
 	mutex_enter(&mvp->v_lock);
 	if ((uap->flags & MS_REMOUNT) == 0 &&
 	    (uap->flags & MS_OVERLAY) == 0 &&
 	    (mvp->v_count != 1 || (mvp->v_flag & VROOT))) {
 		mutex_exit(&mvp->v_lock);
 		return (SET_ERROR(EBUSY));
 	}
 	mutex_exit(&mvp->v_lock);
 
 	/*
 	 * ZFS does not support passing unparsed data in via MS_DATA.
 	 * Users should use the MS_OPTIONSTR interface; this means
 	 * that all option parsing is already done and the options struct
 	 * can be interrogated.
 	 */
 	if ((uap->flags & MS_DATA) && uap->datalen > 0)
 #else	/* !illumos */
 	if (!prison_allow(td->td_ucred, PR_ALLOW_MOUNT_ZFS))
 		return (SET_ERROR(EPERM));
 
 	if (vfs_getopt(vfsp->mnt_optnew, "from", (void **)&osname, NULL))
 		return (SET_ERROR(EINVAL));
 #endif	/* illumos */
 
 	/*
 	 * If full-owner-access is enabled and delegated administration is
 	 * turned on, we must set nosuid.
 	 */
 	if (zfs_super_owner &&
 	    dsl_deleg_access(osname, ZFS_DELEG_PERM_MOUNT, cr) != ECANCELED) {
 		secpolicy_fs_mount_clearopts(cr, vfsp);
 	}
 
 	/*
 	 * Check for mount privilege?
 	 *
 	 * If we don't have privilege then see if
 	 * we have local permission to allow it
 	 */
 	error = secpolicy_fs_mount(cr, mvp, vfsp);
 	if (error) {
 		if (dsl_deleg_access(osname, ZFS_DELEG_PERM_MOUNT, cr) != 0)
 			goto out;
 
 		if (!(vfsp->vfs_flag & MS_REMOUNT)) {
 			vattr_t		vattr;
 
 			/*
 			 * Make sure user is the owner of the mount point
 			 * or has sufficient privileges.
 			 */
 
 			vattr.va_mask = AT_UID;
 
 			vn_lock(mvp, LK_SHARED | LK_RETRY);
 			if (VOP_GETATTR(mvp, &vattr, cr)) {
 				VOP_UNLOCK(mvp, 0);
 				goto out;
 			}
 
 			if (secpolicy_vnode_owner(mvp, cr, vattr.va_uid) != 0 &&
 			    VOP_ACCESS(mvp, VWRITE, cr, td) != 0) {
 				VOP_UNLOCK(mvp, 0);
 				goto out;
 			}
 			VOP_UNLOCK(mvp, 0);
 		}
 
 		secpolicy_fs_mount_clearopts(cr, vfsp);
 	}
 
 	/*
 	 * Refuse to mount a filesystem if we are in a local zone and the
 	 * dataset is not visible.
 	 */
 	if (!INGLOBALZONE(curthread) &&
 	    (!zone_dataset_visible(osname, &canwrite) || !canwrite)) {
 		error = SET_ERROR(EPERM);
 		goto out;
 	}
 
 #ifdef SECLABEL
 	error = zfs_mount_label_policy(vfsp, osname);
 	if (error)
 		goto out;
 #endif
 
 	vfsp->vfs_flag |= MNT_NFS4ACLS;
 
 	/*
 	 * When doing a remount, we simply refresh our temporary properties
 	 * according to those options set in the current VFS options.
 	 */
 	if (vfsp->vfs_flag & MS_REMOUNT) {
 		zfsvfs_t *zfsvfs = vfsp->vfs_data;
 
 		/*
 		 * Refresh mount options with z_teardown_lock blocking I/O while
 		 * the filesystem is in an inconsistent state.
 		 * The lock also serializes this code with filesystem
 		 * manipulations between entry to zfs_suspend_fs() and return
 		 * from zfs_resume_fs().
 		 */
 		rrm_enter(&zfsvfs->z_teardown_lock, RW_WRITER, FTAG);
 		zfs_unregister_callbacks(zfsvfs);
 		error = zfs_register_callbacks(vfsp);
 		rrm_exit(&zfsvfs->z_teardown_lock, FTAG);
 		goto out;
 	}
 
 	/* Initial root mount: try hard to import the requested root pool. */
 	if ((vfsp->vfs_flag & MNT_ROOTFS) != 0 &&
 	    (vfsp->vfs_flag & MNT_UPDATE) == 0) {
 		char pname[MAXNAMELEN];
 
 		error = getpoolname(osname, pname);
 		if (error == 0)
 			error = spa_import_rootpool(pname);
 		if (error)
 			goto out;
 	}
 	DROP_GIANT();
 	error = zfs_domount(vfsp, osname);
 	PICKUP_GIANT();
 
 #ifdef illumos
 	/*
 	 * Add an extra VFS_HOLD on our parent vfs so that it can't
 	 * disappear due to a forced unmount.
 	 */
 	if (error == 0 && ((zfsvfs_t *)vfsp->vfs_data)->z_issnap)
 		VFS_HOLD(mvp->v_vfsp);
 #endif
 
 out:
 	return (error);
 }
 
 static int
 zfs_statfs(vfs_t *vfsp, struct statfs *statp)
 {
 	zfsvfs_t *zfsvfs = vfsp->vfs_data;
 	uint64_t refdbytes, availbytes, usedobjs, availobjs;
 
 	statp->f_version = STATFS_VERSION;
 
 	ZFS_ENTER(zfsvfs);
 
 	dmu_objset_space(zfsvfs->z_os,
 	    &refdbytes, &availbytes, &usedobjs, &availobjs);
 
 	/*
 	 * The underlying storage pool actually uses multiple block sizes.
 	 * We report the fragsize as the smallest block size we support,
 	 * and we report our blocksize as the filesystem's maximum blocksize.
 	 */
 	statp->f_bsize = SPA_MINBLOCKSIZE;
 	statp->f_iosize = zfsvfs->z_vfs->mnt_stat.f_iosize;
 
 	/*
 	 * The following report "total" blocks of various kinds in the
 	 * file system, but reported in terms of f_frsize - the
 	 * "fragment" size.
 	 */
 
 	statp->f_blocks = (refdbytes + availbytes) >> SPA_MINBLOCKSHIFT;
 	statp->f_bfree = availbytes / statp->f_bsize;
 	statp->f_bavail = statp->f_bfree; /* no root reservation */
 
 	/*
 	 * statvfs() should really be called statufs(), because it assumes
 	 * static metadata.  ZFS doesn't preallocate files, so the best
 	 * we can do is report the max that could possibly fit in f_files,
 	 * and that minus the number actually used in f_ffree.
 	 * For f_ffree, report the smaller of the number of object available
 	 * and the number of blocks (each object will take at least a block).
 	 */
 	statp->f_ffree = MIN(availobjs, statp->f_bfree);
 	statp->f_files = statp->f_ffree + usedobjs;
 
 	/*
 	 * We're a zfs filesystem.
 	 */
 	(void) strlcpy(statp->f_fstypename, "zfs", sizeof(statp->f_fstypename));
 
 	strlcpy(statp->f_mntfromname, vfsp->mnt_stat.f_mntfromname,
 	    sizeof(statp->f_mntfromname));
 	strlcpy(statp->f_mntonname, vfsp->mnt_stat.f_mntonname,
 	    sizeof(statp->f_mntonname));
 
 	statp->f_namemax = ZFS_MAXNAMELEN;
 
 	ZFS_EXIT(zfsvfs);
 	return (0);
 }
 
 static int
 zfs_root(vfs_t *vfsp, int flags, vnode_t **vpp)
 {
 	zfsvfs_t *zfsvfs = vfsp->vfs_data;
 	znode_t *rootzp;
 	int error;
 
 	ZFS_ENTER(zfsvfs);
 
 	error = zfs_zget(zfsvfs, zfsvfs->z_root, &rootzp);
 	if (error == 0)
 		*vpp = ZTOV(rootzp);
 
 	ZFS_EXIT(zfsvfs);
 
 	if (error == 0) {
 		error = vn_lock(*vpp, flags);
 		if (error != 0) {
 			VN_RELE(*vpp);
 			*vpp = NULL;
 		}
 	}
 	return (error);
 }
 
 /*
  * Teardown the zfsvfs::z_os.
  *
  * Note, if 'unmounting' if FALSE, we return with the 'z_teardown_lock'
  * and 'z_teardown_inactive_lock' held.
  */
 static int
 zfsvfs_teardown(zfsvfs_t *zfsvfs, boolean_t unmounting)
 {
 	znode_t	*zp;
 
 	rrm_enter(&zfsvfs->z_teardown_lock, RW_WRITER, FTAG);
 
 	if (!unmounting) {
 		/*
 		 * We purge the parent filesystem's vfsp as the parent
 		 * filesystem and all of its snapshots have their vnode's
 		 * v_vfsp set to the parent's filesystem's vfsp.  Note,
 		 * 'z_parent' is self referential for non-snapshots.
 		 */
 		(void) dnlc_purge_vfsp(zfsvfs->z_parent->z_vfs, 0);
 #ifdef FREEBSD_NAMECACHE
 		cache_purgevfs(zfsvfs->z_parent->z_vfs);
 #endif
 	}
 
 	/*
 	 * Close the zil. NB: Can't close the zil while zfs_inactive
 	 * threads are blocked as zil_close can call zfs_inactive.
 	 */
 	if (zfsvfs->z_log) {
 		zil_close(zfsvfs->z_log);
 		zfsvfs->z_log = NULL;
 	}
 
 	rw_enter(&zfsvfs->z_teardown_inactive_lock, RW_WRITER);
 
 	/*
 	 * If we are not unmounting (ie: online recv) and someone already
 	 * unmounted this file system while we were doing the switcheroo,
 	 * or a reopen of z_os failed then just bail out now.
 	 */
 	if (!unmounting && (zfsvfs->z_unmounted || zfsvfs->z_os == NULL)) {
 		rw_exit(&zfsvfs->z_teardown_inactive_lock);
 		rrm_exit(&zfsvfs->z_teardown_lock, FTAG);
 		return (SET_ERROR(EIO));
 	}
 
 	/*
 	 * At this point there are no vops active, and any new vops will
 	 * fail with EIO since we have z_teardown_lock for writer (only
 	 * relavent for forced unmount).
 	 *
 	 * Release all holds on dbufs.
 	 */
 	mutex_enter(&zfsvfs->z_znodes_lock);
 	for (zp = list_head(&zfsvfs->z_all_znodes); zp != NULL;
 	    zp = list_next(&zfsvfs->z_all_znodes, zp))
 		if (zp->z_sa_hdl) {
 			ASSERT(ZTOV(zp)->v_count >= 0);
 			zfs_znode_dmu_fini(zp);
 		}
 	mutex_exit(&zfsvfs->z_znodes_lock);
 
 	/*
 	 * If we are unmounting, set the unmounted flag and let new vops
 	 * unblock.  zfs_inactive will have the unmounted behavior, and all
 	 * other vops will fail with EIO.
 	 */
 	if (unmounting) {
 		zfsvfs->z_unmounted = B_TRUE;
 		rrm_exit(&zfsvfs->z_teardown_lock, FTAG);
 		rw_exit(&zfsvfs->z_teardown_inactive_lock);
 	}
 
 	/*
 	 * z_os will be NULL if there was an error in attempting to reopen
 	 * zfsvfs, so just return as the properties had already been
 	 * unregistered and cached data had been evicted before.
 	 */
 	if (zfsvfs->z_os == NULL)
 		return (0);
 
 	/*
 	 * Unregister properties.
 	 */
 	zfs_unregister_callbacks(zfsvfs);
 
 	/*
 	 * Evict cached data
 	 */
 	if (dsl_dataset_is_dirty(dmu_objset_ds(zfsvfs->z_os)) &&
 	    !(zfsvfs->z_vfs->vfs_flag & VFS_RDONLY))
 		txg_wait_synced(dmu_objset_pool(zfsvfs->z_os), 0);
 	dmu_objset_evict_dbufs(zfsvfs->z_os);
 
 	return (0);
 }
 
 /*ARGSUSED*/
 static int
 zfs_umount(vfs_t *vfsp, int fflag)
 {
 	kthread_t *td = curthread;
 	zfsvfs_t *zfsvfs = vfsp->vfs_data;
 	objset_t *os;
 	cred_t *cr = td->td_ucred;
 	int ret;
 
 	ret = secpolicy_fs_unmount(cr, vfsp);
 	if (ret) {
 		if (dsl_deleg_access((char *)refstr_value(vfsp->vfs_resource),
 		    ZFS_DELEG_PERM_MOUNT, cr))
 			return (ret);
 	}
 
 	/*
 	 * We purge the parent filesystem's vfsp as the parent filesystem
 	 * and all of its snapshots have their vnode's v_vfsp set to the
 	 * parent's filesystem's vfsp.  Note, 'z_parent' is self
 	 * referential for non-snapshots.
 	 */
 	(void) dnlc_purge_vfsp(zfsvfs->z_parent->z_vfs, 0);
 
 	/*
 	 * Unmount any snapshots mounted under .zfs before unmounting the
 	 * dataset itself.
 	 */
 	if (zfsvfs->z_ctldir != NULL) {
 		if ((ret = zfsctl_umount_snapshots(vfsp, fflag, cr)) != 0)
 			return (ret);
 		ret = vflush(vfsp, 0, 0, td);
 		ASSERT(ret == EBUSY);
 		if (!(fflag & MS_FORCE)) {
 			if (zfsvfs->z_ctldir->v_count > 1)
 				return (EBUSY);
 			ASSERT(zfsvfs->z_ctldir->v_count == 1);
 		}
 		zfsctl_destroy(zfsvfs);
 		ASSERT(zfsvfs->z_ctldir == NULL);
 	}
 
 	if (fflag & MS_FORCE) {
 		/*
 		 * Mark file system as unmounted before calling
 		 * vflush(FORCECLOSE). This way we ensure no future vnops
 		 * will be called and risk operating on DOOMED vnodes.
 		 */
 		rrm_enter(&zfsvfs->z_teardown_lock, RW_WRITER, FTAG);
 		zfsvfs->z_unmounted = B_TRUE;
 		rrm_exit(&zfsvfs->z_teardown_lock, FTAG);
 	}
 
 	/*
 	 * Flush all the files.
 	 */
 	ret = vflush(vfsp, 0, (fflag & MS_FORCE) ? FORCECLOSE : 0, td);
 	if (ret != 0) {
 		if (!zfsvfs->z_issnap) {
 			zfsctl_create(zfsvfs);
 			ASSERT(zfsvfs->z_ctldir != NULL);
 		}
 		return (ret);
 	}
 
 #ifdef illumos
 	if (!(fflag & MS_FORCE)) {
 		/*
 		 * Check the number of active vnodes in the file system.
 		 * Our count is maintained in the vfs structure, but the
 		 * number is off by 1 to indicate a hold on the vfs
 		 * structure itself.
 		 *
 		 * The '.zfs' directory maintains a reference of its
 		 * own, and any active references underneath are
 		 * reflected in the vnode count.
 		 */
 		if (zfsvfs->z_ctldir == NULL) {
 			if (vfsp->vfs_count > 1)
 				return (SET_ERROR(EBUSY));
 		} else {
 			if (vfsp->vfs_count > 2 ||
 			    zfsvfs->z_ctldir->v_count > 1)
 				return (SET_ERROR(EBUSY));
 		}
 	}
 #endif
 
 	VERIFY(zfsvfs_teardown(zfsvfs, B_TRUE) == 0);
 	os = zfsvfs->z_os;
 
 	/*
 	 * z_os will be NULL if there was an error in
 	 * attempting to reopen zfsvfs.
 	 */
 	if (os != NULL) {
 		/*
 		 * Unset the objset user_ptr.
 		 */
 		mutex_enter(&os->os_user_ptr_lock);
 		dmu_objset_set_user(os, NULL);
 		mutex_exit(&os->os_user_ptr_lock);
 
 		/*
 		 * Finally release the objset
 		 */
 		dmu_objset_disown(os, zfsvfs);
 	}
 
 	/*
 	 * We can now safely destroy the '.zfs' directory node.
 	 */
 	if (zfsvfs->z_ctldir != NULL)
 		zfsctl_destroy(zfsvfs);
 	zfs_freevfs(vfsp);
 
 	return (0);
 }
 
 static int
 zfs_vget(vfs_t *vfsp, ino_t ino, int flags, vnode_t **vpp)
 {
 	zfsvfs_t	*zfsvfs = vfsp->vfs_data;
 	znode_t		*zp;
 	int 		err;
 
 	/*
 	 * zfs_zget() can't operate on virtual entries like .zfs/ or
 	 * .zfs/snapshot/ directories, that's why we return EOPNOTSUPP.
 	 * This will make NFS to switch to LOOKUP instead of using VGET.
 	 */
 	if (ino == ZFSCTL_INO_ROOT || ino == ZFSCTL_INO_SNAPDIR ||
 	    (zfsvfs->z_shares_dir != 0 && ino == zfsvfs->z_shares_dir))
 		return (EOPNOTSUPP);
 
 	ZFS_ENTER(zfsvfs);
 	err = zfs_zget(zfsvfs, ino, &zp);
 	if (err == 0 && zp->z_unlinked) {
-		VN_RELE(ZTOV(zp));
+		vrele(ZTOV(zp));
 		err = EINVAL;
 	}
 	if (err == 0)
 		*vpp = ZTOV(zp);
 	ZFS_EXIT(zfsvfs);
 	if (err == 0)
 		err = vn_lock(*vpp, flags);
 	if (err != 0)
 		*vpp = NULL;
 	return (err);
 }
 
 static int
 zfs_checkexp(vfs_t *vfsp, struct sockaddr *nam, int *extflagsp,
     struct ucred **credanonp, int *numsecflavors, int **secflavors)
 {
 	zfsvfs_t *zfsvfs = vfsp->vfs_data;
 
 	/*
 	 * If this is regular file system vfsp is the same as
 	 * zfsvfs->z_parent->z_vfs, but if it is snapshot,
 	 * zfsvfs->z_parent->z_vfs represents parent file system
 	 * which we have to use here, because only this file system
 	 * has mnt_export configured.
 	 */
 	return (vfs_stdcheckexp(zfsvfs->z_parent->z_vfs, nam, extflagsp,
 	    credanonp, numsecflavors, secflavors));
 }
 
 CTASSERT(SHORT_FID_LEN <= sizeof(struct fid));
 CTASSERT(LONG_FID_LEN <= sizeof(struct fid));
 
 static int
 zfs_fhtovp(vfs_t *vfsp, fid_t *fidp, int flags, vnode_t **vpp)
 {
 	zfsvfs_t	*zfsvfs = vfsp->vfs_data;
 	znode_t		*zp;
 	uint64_t	object = 0;
 	uint64_t	fid_gen = 0;
 	uint64_t	gen_mask;
 	uint64_t	zp_gen;
 	int 		i, err;
 
 	*vpp = NULL;
 
 	ZFS_ENTER(zfsvfs);
 
 	/*
 	 * On FreeBSD we can get snapshot's mount point or its parent file
 	 * system mount point depending if snapshot is already mounted or not.
 	 */
 	if (zfsvfs->z_parent == zfsvfs && fidp->fid_len == LONG_FID_LEN) {
 		zfid_long_t	*zlfid = (zfid_long_t *)fidp;
 		uint64_t	objsetid = 0;
 		uint64_t	setgen = 0;
 
 		for (i = 0; i < sizeof (zlfid->zf_setid); i++)
 			objsetid |= ((uint64_t)zlfid->zf_setid[i]) << (8 * i);
 
 		for (i = 0; i < sizeof (zlfid->zf_setgen); i++)
 			setgen |= ((uint64_t)zlfid->zf_setgen[i]) << (8 * i);
 
 		ZFS_EXIT(zfsvfs);
 
 		err = zfsctl_lookup_objset(vfsp, objsetid, &zfsvfs);
 		if (err)
 			return (SET_ERROR(EINVAL));
 		ZFS_ENTER(zfsvfs);
 	}
 
 	if (fidp->fid_len == SHORT_FID_LEN || fidp->fid_len == LONG_FID_LEN) {
 		zfid_short_t	*zfid = (zfid_short_t *)fidp;
 
 		for (i = 0; i < sizeof (zfid->zf_object); i++)
 			object |= ((uint64_t)zfid->zf_object[i]) << (8 * i);
 
 		for (i = 0; i < sizeof (zfid->zf_gen); i++)
 			fid_gen |= ((uint64_t)zfid->zf_gen[i]) << (8 * i);
 	} else {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EINVAL));
 	}
 
 	/*
 	 * A zero fid_gen means we are in .zfs or the .zfs/snapshot
 	 * directory tree. If the object == zfsvfs->z_shares_dir, then
 	 * we are in the .zfs/shares directory tree.
 	 */
 	if ((fid_gen == 0 &&
 	     (object == ZFSCTL_INO_ROOT || object == ZFSCTL_INO_SNAPDIR)) ||
 	    (zfsvfs->z_shares_dir != 0 && object == zfsvfs->z_shares_dir)) {
 		*vpp = zfsvfs->z_ctldir;
 		ASSERT(*vpp != NULL);
 		if (object == ZFSCTL_INO_SNAPDIR) {
 			VERIFY(zfsctl_root_lookup(*vpp, "snapshot", vpp, NULL,
 			    0, NULL, NULL, NULL, NULL, NULL) == 0);
 		} else if (object == zfsvfs->z_shares_dir) {
 			VERIFY(zfsctl_root_lookup(*vpp, "shares", vpp, NULL,
 			    0, NULL, NULL, NULL, NULL, NULL) == 0);
 		} else {
-			VN_HOLD(*vpp);
+			vref(*vpp);
 		}
 		ZFS_EXIT(zfsvfs);
 		err = vn_lock(*vpp, flags);
 		if (err != 0)
 			*vpp = NULL;
 		return (err);
 	}
 
 	gen_mask = -1ULL >> (64 - 8 * i);
 
 	dprintf("getting %llu [%u mask %llx]\n", object, fid_gen, gen_mask);
 	if (err = zfs_zget(zfsvfs, object, &zp)) {
 		ZFS_EXIT(zfsvfs);
 		return (err);
 	}
 	(void) sa_lookup(zp->z_sa_hdl, SA_ZPL_GEN(zfsvfs), &zp_gen,
 	    sizeof (uint64_t));
 	zp_gen = zp_gen & gen_mask;
 	if (zp_gen == 0)
 		zp_gen = 1;
 	if (zp->z_unlinked || zp_gen != fid_gen) {
 		dprintf("znode gen (%u) != fid gen (%u)\n", zp_gen, fid_gen);
-		VN_RELE(ZTOV(zp));
+		vrele(ZTOV(zp));
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EINVAL));
 	}
 
 	*vpp = ZTOV(zp);
 	ZFS_EXIT(zfsvfs);
 	err = vn_lock(*vpp, flags | LK_RETRY);
 	if (err == 0)
 		vnode_create_vobject(*vpp, zp->z_size, curthread);
 	else
 		*vpp = NULL;
 	return (err);
 }
 
 /*
  * Block out VOPs and close zfsvfs_t::z_os
  *
  * Note, if successful, then we return with the 'z_teardown_lock' and
  * 'z_teardown_inactive_lock' write held.  We leave ownership of the underlying
  * dataset and objset intact so that they can be atomically handed off during
  * a subsequent rollback or recv operation and the resume thereafter.
  */
 int
 zfs_suspend_fs(zfsvfs_t *zfsvfs)
 {
 	int error;
 
 	if ((error = zfsvfs_teardown(zfsvfs, B_FALSE)) != 0)
 		return (error);
 
 	return (0);
 }
 
 /*
  * Rebuild SA and release VOPs.  Note that ownership of the underlying dataset
  * is an invariant across any of the operations that can be performed while the
  * filesystem was suspended.  Whether it succeeded or failed, the preconditions
  * are the same: the relevant objset and associated dataset are owned by
  * zfsvfs, held, and long held on entry.
  */
 int
 zfs_resume_fs(zfsvfs_t *zfsvfs, const char *osname)
 {
 	int err;
 	znode_t *zp;
 
 	ASSERT(RRM_WRITE_HELD(&zfsvfs->z_teardown_lock));
 	ASSERT(RW_WRITE_HELD(&zfsvfs->z_teardown_inactive_lock));
 
 	/*
 	 * We already own this, so just hold and rele it to update the
 	 * objset_t, as the one we had before may have been evicted.
 	 */
 	objset_t *os;
 	VERIFY0(dmu_objset_hold(osname, zfsvfs, &os));
 	VERIFY3P(os->os_dsl_dataset->ds_owner, ==, zfsvfs);
 	VERIFY(dsl_dataset_long_held(os->os_dsl_dataset));
 	dmu_objset_rele(os, zfsvfs);
 
 	err = zfsvfs_init(zfsvfs, os);
 	if (err != 0)
 		goto bail;
 
 	VERIFY(zfsvfs_setup(zfsvfs, B_FALSE) == 0);
 
 	zfs_set_fuid_feature(zfsvfs);
 
 	/*
 	 * Attempt to re-establish all the active znodes with
 	 * their dbufs.  If a zfs_rezget() fails, then we'll let
 	 * any potential callers discover that via ZFS_ENTER_VERIFY_VP
 	 * when they try to use their znode.
 	 */
 	mutex_enter(&zfsvfs->z_znodes_lock);
 	for (zp = list_head(&zfsvfs->z_all_znodes); zp;
 	    zp = list_next(&zfsvfs->z_all_znodes, zp)) {
 		(void) zfs_rezget(zp);
 	}
 	mutex_exit(&zfsvfs->z_znodes_lock);
 
 bail:
 	/* release the VOPs */
 	rw_exit(&zfsvfs->z_teardown_inactive_lock);
 	rrm_exit(&zfsvfs->z_teardown_lock, FTAG);
 
 	if (err) {
 		/*
 		 * Since we couldn't setup the sa framework, try to force
 		 * unmount this file system.
 		 */
 		if (vn_vfswlock(zfsvfs->z_vfs->vfs_vnodecovered) == 0) {
 			vfs_ref(zfsvfs->z_vfs);
 			(void) dounmount(zfsvfs->z_vfs, MS_FORCE, curthread);
 		}
 	}
 	return (err);
 }
 
 static void
 zfs_freevfs(vfs_t *vfsp)
 {
 	zfsvfs_t *zfsvfs = vfsp->vfs_data;
 
 #ifdef illumos
 	/*
 	 * If this is a snapshot, we have an extra VFS_HOLD on our parent
 	 * from zfs_mount().  Release it here.  If we came through
 	 * zfs_mountroot() instead, we didn't grab an extra hold, so
 	 * skip the VFS_RELE for rootvfs.
 	 */
 	if (zfsvfs->z_issnap && (vfsp != rootvfs))
 		VFS_RELE(zfsvfs->z_parent->z_vfs);
 #endif
 
 	zfsvfs_free(zfsvfs);
 
 	atomic_dec_32(&zfs_active_fs_count);
 }
 
 #ifdef __i386__
 static int desiredvnodes_backup;
 #endif
 
 static void
 zfs_vnodes_adjust(void)
 {
 #ifdef __i386__
 	int newdesiredvnodes;
 
 	desiredvnodes_backup = desiredvnodes;
 
 	/*
 	 * We calculate newdesiredvnodes the same way it is done in
 	 * vntblinit(). If it is equal to desiredvnodes, it means that
 	 * it wasn't tuned by the administrator and we can tune it down.
 	 */
 	newdesiredvnodes = min(maxproc + vm_cnt.v_page_count / 4, 2 *
 	    vm_kmem_size / (5 * (sizeof(struct vm_object) +
 	    sizeof(struct vnode))));
 	if (newdesiredvnodes == desiredvnodes)
 		desiredvnodes = (3 * newdesiredvnodes) / 4;
 #endif
 }
 
 static void
 zfs_vnodes_adjust_back(void)
 {
 
 #ifdef __i386__
 	desiredvnodes = desiredvnodes_backup;
 #endif
 }
 
 void
 zfs_init(void)
 {
 
 	printf("ZFS filesystem version: " ZPL_VERSION_STRING "\n");
 
 	/*
 	 * Initialize .zfs directory structures
 	 */
 	zfsctl_init();
 
 	/*
 	 * Initialize znode cache, vnode ops, etc...
 	 */
 	zfs_znode_init();
 
 	/*
 	 * Reduce number of vnodes. Originally number of vnodes is calculated
 	 * with UFS inode in mind. We reduce it here, because it's too big for
 	 * ZFS/i386.
 	 */
 	zfs_vnodes_adjust();
 
 	dmu_objset_register_type(DMU_OST_ZFS, zfs_space_delta_cb);
 }
 
 void
 zfs_fini(void)
 {
 	zfsctl_fini();
 	zfs_znode_fini();
 	zfs_vnodes_adjust_back();
 }
 
 int
 zfs_busy(void)
 {
 	return (zfs_active_fs_count != 0);
 }
 
 int
 zfs_set_version(zfsvfs_t *zfsvfs, uint64_t newvers)
 {
 	int error;
 	objset_t *os = zfsvfs->z_os;
 	dmu_tx_t *tx;
 
 	if (newvers < ZPL_VERSION_INITIAL || newvers > ZPL_VERSION)
 		return (SET_ERROR(EINVAL));
 
 	if (newvers < zfsvfs->z_version)
 		return (SET_ERROR(EINVAL));
 
 	if (zfs_spa_version_map(newvers) >
 	    spa_version(dmu_objset_spa(zfsvfs->z_os)))
 		return (SET_ERROR(ENOTSUP));
 
 	tx = dmu_tx_create(os);
 	dmu_tx_hold_zap(tx, MASTER_NODE_OBJ, B_FALSE, ZPL_VERSION_STR);
 	if (newvers >= ZPL_VERSION_SA && !zfsvfs->z_use_sa) {
 		dmu_tx_hold_zap(tx, MASTER_NODE_OBJ, B_TRUE,
 		    ZFS_SA_ATTRS);
 		dmu_tx_hold_zap(tx, DMU_NEW_OBJECT, FALSE, NULL);
 	}
 	error = dmu_tx_assign(tx, TXG_WAIT);
 	if (error) {
 		dmu_tx_abort(tx);
 		return (error);
 	}
 
 	error = zap_update(os, MASTER_NODE_OBJ, ZPL_VERSION_STR,
 	    8, 1, &newvers, tx);
 
 	if (error) {
 		dmu_tx_commit(tx);
 		return (error);
 	}
 
 	if (newvers >= ZPL_VERSION_SA && !zfsvfs->z_use_sa) {
 		uint64_t sa_obj;
 
 		ASSERT3U(spa_version(dmu_objset_spa(zfsvfs->z_os)), >=,
 		    SPA_VERSION_SA);
 		sa_obj = zap_create(os, DMU_OT_SA_MASTER_NODE,
 		    DMU_OT_NONE, 0, tx);
 
 		error = zap_add(os, MASTER_NODE_OBJ,
 		    ZFS_SA_ATTRS, 8, 1, &sa_obj, tx);
 		ASSERT0(error);
 
 		VERIFY(0 == sa_set_sa_object(os, sa_obj));
 		sa_register_update_callback(os, zfs_sa_upgrade);
 	}
 
 	spa_history_log_internal_ds(dmu_objset_ds(os), "upgrade", tx,
 	    "from %llu to %llu", zfsvfs->z_version, newvers);
 
 	dmu_tx_commit(tx);
 
 	zfsvfs->z_version = newvers;
 
 	zfs_set_fuid_feature(zfsvfs);
 
 	return (0);
 }
 
 /*
  * Read a property stored within the master node.
  */
 int
 zfs_get_zplprop(objset_t *os, zfs_prop_t prop, uint64_t *value)
 {
 	const char *pname;
 	int error = ENOENT;
 
 	/*
 	 * Look up the file system's value for the property.  For the
 	 * version property, we look up a slightly different string.
 	 */
 	if (prop == ZFS_PROP_VERSION)
 		pname = ZPL_VERSION_STR;
 	else
 		pname = zfs_prop_to_name(prop);
 
 	if (os != NULL)
 		error = zap_lookup(os, MASTER_NODE_OBJ, pname, 8, 1, value);
 
 	if (error == ENOENT) {
 		/* No value set, use the default value */
 		switch (prop) {
 		case ZFS_PROP_VERSION:
 			*value = ZPL_VERSION;
 			break;
 		case ZFS_PROP_NORMALIZE:
 		case ZFS_PROP_UTF8ONLY:
 			*value = 0;
 			break;
 		case ZFS_PROP_CASE:
 			*value = ZFS_CASE_SENSITIVE;
 			break;
 		default:
 			return (error);
 		}
 		error = 0;
 	}
 	return (error);
 }
 
 #ifdef _KERNEL
 void
 zfsvfs_update_fromname(const char *oldname, const char *newname)
 {
 	char tmpbuf[MAXPATHLEN];
 	struct mount *mp;
 	char *fromname;
 	size_t oldlen;
 
 	oldlen = strlen(oldname);
 
 	mtx_lock(&mountlist_mtx);
 	TAILQ_FOREACH(mp, &mountlist, mnt_list) {
 		fromname = mp->mnt_stat.f_mntfromname;
 		if (strcmp(fromname, oldname) == 0) {
 			(void)strlcpy(fromname, newname,
 			    sizeof(mp->mnt_stat.f_mntfromname));
 			continue;
 		}
 		if (strncmp(fromname, oldname, oldlen) == 0 &&
 		    (fromname[oldlen] == '/' || fromname[oldlen] == '@')) {
 			(void)snprintf(tmpbuf, sizeof(tmpbuf), "%s%s",
 			    newname, fromname + oldlen);
 			(void)strlcpy(fromname, tmpbuf,
 			    sizeof(mp->mnt_stat.f_mntfromname));
 			continue;
 		}
 	}
 	mtx_unlock(&mountlist_mtx);
 }
 #endif
Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c	(revision 303775)
@@ -1,7284 +1,6025 @@
 /*
  * CDDL HEADER START
  *
  * The contents of this file are subject to the terms of the
  * Common Development and Distribution License (the "License").
  * You may not use this file except in compliance with the License.
  *
  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  * or http://www.opensolaris.org/os/licensing.
  * See the License for the specific language governing permissions
  * and limitations under the License.
  *
  * When distributing Covered Code, include this CDDL HEADER in each
  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  * If applicable, add the following below this CDDL HEADER, with the
  * fields enclosed by brackets "[]" replaced with your own identifying
  * information: Portions Copyright [yyyy] [name of copyright owner]
  *
  * CDDL HEADER END
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  * Copyright (c) 2012, 2015 by Delphix. All rights reserved.
  * Copyright 2014 Nexenta Systems, Inc.  All rights reserved.
  * Copyright (c) 2014 Integros [integros.com]
  */
 
 /* Portions Copyright 2007 Jeremy Teo */
 /* Portions Copyright 2010 Robert Milkowski */
 
 #include <sys/types.h>
 #include <sys/param.h>
 #include <sys/time.h>
 #include <sys/systm.h>
 #include <sys/sysmacros.h>
 #include <sys/resource.h>
 #include <sys/vfs.h>
 #include <sys/vm.h>
 #include <sys/vnode.h>
 #include <sys/file.h>
 #include <sys/stat.h>
 #include <sys/kmem.h>
 #include <sys/taskq.h>
 #include <sys/uio.h>
 #include <sys/atomic.h>
 #include <sys/namei.h>
 #include <sys/mman.h>
 #include <sys/cmn_err.h>
 #include <sys/errno.h>
 #include <sys/unistd.h>
 #include <sys/zfs_dir.h>
 #include <sys/zfs_ioctl.h>
 #include <sys/fs/zfs.h>
 #include <sys/dmu.h>
 #include <sys/dmu_objset.h>
 #include <sys/spa.h>
 #include <sys/txg.h>
 #include <sys/dbuf.h>
 #include <sys/zap.h>
 #include <sys/sa.h>
 #include <sys/dirent.h>
 #include <sys/policy.h>
 #include <sys/sunddi.h>
 #include <sys/filio.h>
 #include <sys/sid.h>
 #include <sys/zfs_ctldir.h>
 #include <sys/zfs_fuid.h>
 #include <sys/zfs_sa.h>
-#include <sys/dnlc.h>
 #include <sys/zfs_rlock.h>
 #include <sys/extdirent.h>
 #include <sys/kidmap.h>
 #include <sys/bio.h>
 #include <sys/buf.h>
 #include <sys/sched.h>
 #include <sys/acl.h>
 #include <vm/vm_param.h>
 
 /*
  * Programming rules.
  *
  * Each vnode op performs some logical unit of work.  To do this, the ZPL must
  * properly lock its in-core state, create a DMU transaction, do the work,
  * record this work in the intent log (ZIL), commit the DMU transaction,
  * and wait for the intent log to commit if it is a synchronous operation.
  * Moreover, the vnode ops must work in both normal and log replay context.
  * The ordering of events is important to avoid deadlocks and references
  * to freed memory.  The example below illustrates the following Big Rules:
  *
  *  (1)	A check must be made in each zfs thread for a mounted file system.
  *	This is done avoiding races using ZFS_ENTER(zfsvfs).
  *	A ZFS_EXIT(zfsvfs) is needed before all returns.  Any znodes
  *	must be checked with ZFS_VERIFY_ZP(zp).  Both of these macros
  *	can return EIO from the calling function.
  *
  *  (2)	VN_RELE() should always be the last thing except for zil_commit()
  *	(if necessary) and ZFS_EXIT(). This is for 3 reasons:
  *	First, if it's the last reference, the vnode/znode
  *	can be freed, so the zp may point to freed memory.  Second, the last
  *	reference will call zfs_zinactive(), which may induce a lot of work --
  *	pushing cached pages (which acquires range locks) and syncing out
  *	cached atime changes.  Third, zfs_zinactive() may require a new tx,
  *	which could deadlock the system if you were already holding one.
  *	If you must call VN_RELE() within a tx then use VN_RELE_ASYNC().
  *
  *  (3)	All range locks must be grabbed before calling dmu_tx_assign(),
  *	as they can span dmu_tx_assign() calls.
  *
  *  (4) If ZPL locks are held, pass TXG_NOWAIT as the second argument to
  *      dmu_tx_assign().  This is critical because we don't want to block
  *      while holding locks.
  *
  *	If no ZPL locks are held (aside from ZFS_ENTER()), use TXG_WAIT.  This
  *	reduces lock contention and CPU usage when we must wait (note that if
  *	throughput is constrained by the storage, nearly every transaction
  *	must wait).
  *
  *      Note, in particular, that if a lock is sometimes acquired before
  *      the tx assigns, and sometimes after (e.g. z_lock), then failing
  *      to use a non-blocking assign can deadlock the system.  The scenario:
  *
  *	Thread A has grabbed a lock before calling dmu_tx_assign().
  *	Thread B is in an already-assigned tx, and blocks for this lock.
  *	Thread A calls dmu_tx_assign(TXG_WAIT) and blocks in txg_wait_open()
  *	forever, because the previous txg can't quiesce until B's tx commits.
  *
  *	If dmu_tx_assign() returns ERESTART and zfsvfs->z_assign is TXG_NOWAIT,
  *	then drop all locks, call dmu_tx_wait(), and try again.  On subsequent
  *	calls to dmu_tx_assign(), pass TXG_WAITED rather than TXG_NOWAIT,
  *	to indicate that this operation has already called dmu_tx_wait().
  *	This will ensure that we don't retry forever, waiting a short bit
  *	each time.
  *
  *  (5)	If the operation succeeded, generate the intent log entry for it
  *	before dropping locks.  This ensures that the ordering of events
  *	in the intent log matches the order in which they actually occurred.
  *	During ZIL replay the zfs_log_* functions will update the sequence
  *	number to indicate the zil transaction has replayed.
  *
  *  (6)	At the end of each vnode op, the DMU tx must always commit,
  *	regardless of whether there were any errors.
  *
  *  (7)	After dropping all locks, invoke zil_commit(zilog, foid)
  *	to ensure that synchronous semantics are provided when necessary.
  *
  * In general, this is how things should be ordered in each vnode op:
  *
  *	ZFS_ENTER(zfsvfs);		// exit if unmounted
  * top:
- *	zfs_dirent_lock(&dl, ...)	// lock directory entry (may VN_HOLD())
+ *	zfs_dirent_lookup(&dl, ...)	// lock directory entry (may VN_HOLD())
  *	rw_enter(...);			// grab any other locks you need
  *	tx = dmu_tx_create(...);	// get DMU tx
  *	dmu_tx_hold_*();		// hold each object you might modify
  *	error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT);
  *	if (error) {
  *		rw_exit(...);		// drop locks
  *		zfs_dirent_unlock(dl);	// unlock directory entry
  *		VN_RELE(...);		// release held vnodes
  *		if (error == ERESTART) {
  *			waited = B_TRUE;
  *			dmu_tx_wait(tx);
  *			dmu_tx_abort(tx);
  *			goto top;
  *		}
  *		dmu_tx_abort(tx);	// abort DMU tx
  *		ZFS_EXIT(zfsvfs);	// finished in zfs
  *		return (error);		// really out of space
  *	}
  *	error = do_real_work();		// do whatever this VOP does
  *	if (error == 0)
  *		zfs_log_*(...);		// on success, make ZIL entry
  *	dmu_tx_commit(tx);		// commit DMU tx -- error or not
  *	rw_exit(...);			// drop locks
  *	zfs_dirent_unlock(dl);		// unlock directory entry
  *	VN_RELE(...);			// release held vnodes
  *	zil_commit(zilog, foid);	// synchronous when necessary
  *	ZFS_EXIT(zfsvfs);		// finished in zfs
  *	return (error);			// done, report error
  */
 
 /* ARGSUSED */
 static int
 zfs_open(vnode_t **vpp, int flag, cred_t *cr, caller_context_t *ct)
 {
 	znode_t	*zp = VTOZ(*vpp);
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	if ((flag & FWRITE) && (zp->z_pflags & ZFS_APPENDONLY) &&
 	    ((flag & FAPPEND) == 0)) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EPERM));
 	}
 
 	if (!zfs_has_ctldir(zp) && zp->z_zfsvfs->z_vscan &&
 	    ZTOV(zp)->v_type == VREG &&
 	    !(zp->z_pflags & ZFS_AV_QUARANTINED) && zp->z_size > 0) {
 		if (fs_vscan(*vpp, cr, 0) != 0) {
 			ZFS_EXIT(zfsvfs);
 			return (SET_ERROR(EACCES));
 		}
 	}
 
 	/* Keep a count of the synchronous opens in the znode */
 	if (flag & (FSYNC | FDSYNC))
 		atomic_inc_32(&zp->z_sync_cnt);
 
 	ZFS_EXIT(zfsvfs);
 	return (0);
 }
 
 /* ARGSUSED */
 static int
 zfs_close(vnode_t *vp, int flag, int count, offset_t offset, cred_t *cr,
     caller_context_t *ct)
 {
 	znode_t	*zp = VTOZ(vp);
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 
 	/*
 	 * Clean up any locks held by this process on the vp.
 	 */
 	cleanlocks(vp, ddi_get_pid(), 0);
 	cleanshares(vp, ddi_get_pid());
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	/* Decrement the synchronous opens in the znode */
 	if ((flag & (FSYNC | FDSYNC)) && (count == 1))
 		atomic_dec_32(&zp->z_sync_cnt);
 
 	if (!zfs_has_ctldir(zp) && zp->z_zfsvfs->z_vscan &&
 	    ZTOV(zp)->v_type == VREG &&
 	    !(zp->z_pflags & ZFS_AV_QUARANTINED) && zp->z_size > 0)
 		VERIFY(fs_vscan(vp, cr, 1) == 0);
 
 	ZFS_EXIT(zfsvfs);
 	return (0);
 }
 
 /*
  * Lseek support for finding holes (cmd == _FIO_SEEK_HOLE) and
  * data (cmd == _FIO_SEEK_DATA). "off" is an in/out parameter.
  */
 static int
 zfs_holey(vnode_t *vp, u_long cmd, offset_t *off)
 {
 	znode_t	*zp = VTOZ(vp);
 	uint64_t noff = (uint64_t)*off; /* new offset */
 	uint64_t file_sz;
 	int error;
 	boolean_t hole;
 
 	file_sz = zp->z_size;
 	if (noff >= file_sz)  {
 		return (SET_ERROR(ENXIO));
 	}
 
 	if (cmd == _FIO_SEEK_HOLE)
 		hole = B_TRUE;
 	else
 		hole = B_FALSE;
 
 	error = dmu_offset_next(zp->z_zfsvfs->z_os, zp->z_id, hole, &noff);
 
 	if (error == ESRCH)
 		return (SET_ERROR(ENXIO));
 
 	/*
 	 * We could find a hole that begins after the logical end-of-file,
 	 * because dmu_offset_next() only works on whole blocks.  If the
 	 * EOF falls mid-block, then indicate that the "virtual hole"
 	 * at the end of the file begins at the logical EOF, rather than
 	 * at the end of the last block.
 	 */
 	if (noff > file_sz) {
 		ASSERT(hole);
 		noff = file_sz;
 	}
 
 	if (noff < *off)
 		return (error);
 	*off = noff;
 	return (error);
 }
 
 /* ARGSUSED */
 static int
 zfs_ioctl(vnode_t *vp, u_long com, intptr_t data, int flag, cred_t *cred,
     int *rvalp, caller_context_t *ct)
 {
 	offset_t off;
 	offset_t ndata;
 	dmu_object_info_t doi;
 	int error;
 	zfsvfs_t *zfsvfs;
 	znode_t *zp;
 
 	switch (com) {
 	case _FIOFFS:
 	{
 		return (0);
 
 		/*
 		 * The following two ioctls are used by bfu.  Faking out,
 		 * necessary to avoid bfu errors.
 		 */
 	}
 	case _FIOGDIO:
 	case _FIOSDIO:
 	{
 		return (0);
 	}
 
 	case _FIO_SEEK_DATA:
 	case _FIO_SEEK_HOLE:
 	{
 #ifdef illumos
 		if (ddi_copyin((void *)data, &off, sizeof (off), flag))
 			return (SET_ERROR(EFAULT));
 #else
 		off = *(offset_t *)data;
 #endif
 		zp = VTOZ(vp);
 		zfsvfs = zp->z_zfsvfs;
 		ZFS_ENTER(zfsvfs);
 		ZFS_VERIFY_ZP(zp);
 
 		/* offset parameter is in/out */
 		error = zfs_holey(vp, com, &off);
 		ZFS_EXIT(zfsvfs);
 		if (error)
 			return (error);
 #ifdef illumos
 		if (ddi_copyout(&off, (void *)data, sizeof (off), flag))
 			return (SET_ERROR(EFAULT));
 #else
 		*(offset_t *)data = off;
 #endif
 		return (0);
 	}
 #ifdef illumos
 	case _FIO_COUNT_FILLED:
 	{
 		/*
 		 * _FIO_COUNT_FILLED adds a new ioctl command which
 		 * exposes the number of filled blocks in a
 		 * ZFS object.
 		 */
 		zp = VTOZ(vp);
 		zfsvfs = zp->z_zfsvfs;
 		ZFS_ENTER(zfsvfs);
 		ZFS_VERIFY_ZP(zp);
 
 		/*
 		 * Wait for all dirty blocks for this object
 		 * to get synced out to disk, and the DMU info
 		 * updated.
 		 */
 		error = dmu_object_wait_synced(zfsvfs->z_os, zp->z_id);
 		if (error) {
 			ZFS_EXIT(zfsvfs);
 			return (error);
 		}
 
 		/*
 		 * Retrieve fill count from DMU object.
 		 */
 		error = dmu_object_info(zfsvfs->z_os, zp->z_id, &doi);
 		if (error) {
 			ZFS_EXIT(zfsvfs);
 			return (error);
 		}
 
 		ndata = doi.doi_fill_count;
 
 		ZFS_EXIT(zfsvfs);
 		if (ddi_copyout(&ndata, (void *)data, sizeof (ndata), flag))
 			return (SET_ERROR(EFAULT));
 		return (0);
 	}
 #endif
 	}
 	return (SET_ERROR(ENOTTY));
 }
 
 static vm_page_t
 page_busy(vnode_t *vp, int64_t start, int64_t off, int64_t nbytes)
 {
 	vm_object_t obj;
 	vm_page_t pp;
 	int64_t end;
 
 	/*
 	 * At present vm_page_clear_dirty extends the cleared range to DEV_BSIZE
 	 * aligned boundaries, if the range is not aligned.  As a result a
 	 * DEV_BSIZE subrange with partially dirty data may get marked as clean.
 	 * It may happen that all DEV_BSIZE subranges are marked clean and thus
 	 * the whole page would be considred clean despite have some dirty data.
 	 * For this reason we should shrink the range to DEV_BSIZE aligned
 	 * boundaries before calling vm_page_clear_dirty.
 	 */
 	end = rounddown2(off + nbytes, DEV_BSIZE);
 	off = roundup2(off, DEV_BSIZE);
 	nbytes = end - off;
 
 	obj = vp->v_object;
 	zfs_vmobject_assert_wlocked(obj);
 
 	for (;;) {
 		if ((pp = vm_page_lookup(obj, OFF_TO_IDX(start))) != NULL &&
 		    pp->valid) {
 			if (vm_page_xbusied(pp)) {
 				/*
 				 * Reference the page before unlocking and
 				 * sleeping so that the page daemon is less
 				 * likely to reclaim it.
 				 */
 				vm_page_reference(pp);
 				vm_page_lock(pp);
 				zfs_vmobject_wunlock(obj);
 				vm_page_busy_sleep(pp, "zfsmwb");
 				zfs_vmobject_wlock(obj);
 				continue;
 			}
 			vm_page_sbusy(pp);
 		} else if (pp == NULL) {
 			pp = vm_page_alloc(obj, OFF_TO_IDX(start),
 			    VM_ALLOC_SYSTEM | VM_ALLOC_IFCACHED |
 			    VM_ALLOC_SBUSY);
 		} else {
 			ASSERT(pp != NULL && !pp->valid);
 			pp = NULL;
 		}
 
 		if (pp != NULL) {
 			ASSERT3U(pp->valid, ==, VM_PAGE_BITS_ALL);
 			vm_object_pip_add(obj, 1);
 			pmap_remove_write(pp);
 			if (nbytes != 0)
 				vm_page_clear_dirty(pp, off, nbytes);
 		}
 		break;
 	}
 	return (pp);
 }
 
 static void
 page_unbusy(vm_page_t pp)
 {
 
 	vm_page_sunbusy(pp);
 	vm_object_pip_subtract(pp->object, 1);
 }
 
 static vm_page_t
 page_hold(vnode_t *vp, int64_t start)
 {
 	vm_object_t obj;
 	vm_page_t pp;
 
 	obj = vp->v_object;
 	zfs_vmobject_assert_wlocked(obj);
 
 	for (;;) {
 		if ((pp = vm_page_lookup(obj, OFF_TO_IDX(start))) != NULL &&
 		    pp->valid) {
 			if (vm_page_xbusied(pp)) {
 				/*
 				 * Reference the page before unlocking and
 				 * sleeping so that the page daemon is less
 				 * likely to reclaim it.
 				 */
 				vm_page_reference(pp);
 				vm_page_lock(pp);
 				zfs_vmobject_wunlock(obj);
 				vm_page_busy_sleep(pp, "zfsmwb");
 				zfs_vmobject_wlock(obj);
 				continue;
 			}
 
 			ASSERT3U(pp->valid, ==, VM_PAGE_BITS_ALL);
 			vm_page_lock(pp);
 			vm_page_hold(pp);
 			vm_page_unlock(pp);
 
 		} else
 			pp = NULL;
 		break;
 	}
 	return (pp);
 }
 
 static void
 page_unhold(vm_page_t pp)
 {
 
 	vm_page_lock(pp);
 	vm_page_unhold(pp);
 	vm_page_unlock(pp);
 }
 
 /*
  * When a file is memory mapped, we must keep the IO data synchronized
  * between the DMU cache and the memory mapped pages.  What this means:
  *
  * On Write:	If we find a memory mapped page, we write to *both*
  *		the page and the dmu buffer.
  */
 static void
 update_pages(vnode_t *vp, int64_t start, int len, objset_t *os, uint64_t oid,
     int segflg, dmu_tx_t *tx)
 {
 	vm_object_t obj;
 	struct sf_buf *sf;
 	caddr_t va;
 	int off;
 
 	ASSERT(segflg != UIO_NOCOPY);
 	ASSERT(vp->v_mount != NULL);
 	obj = vp->v_object;
 	ASSERT(obj != NULL);
 
 	off = start & PAGEOFFSET;
 	zfs_vmobject_wlock(obj);
 	for (start &= PAGEMASK; len > 0; start += PAGESIZE) {
 		vm_page_t pp;
 		int nbytes = imin(PAGESIZE - off, len);
 
 		if ((pp = page_busy(vp, start, off, nbytes)) != NULL) {
 			zfs_vmobject_wunlock(obj);
 
 			va = zfs_map_page(pp, &sf);
 			(void) dmu_read(os, oid, start+off, nbytes,
 			    va+off, DMU_READ_PREFETCH);;
 			zfs_unmap_page(sf);
 
 			zfs_vmobject_wlock(obj);
 			page_unbusy(pp);
 		}
 		len -= nbytes;
 		off = 0;
 	}
 	vm_object_pip_wakeupn(obj, 0);
 	zfs_vmobject_wunlock(obj);
 }
 
 /*
  * Read with UIO_NOCOPY flag means that sendfile(2) requests
  * ZFS to populate a range of page cache pages with data.
  *
  * NOTE: this function could be optimized to pre-allocate
  * all pages in advance, drain exclusive busy on all of them,
  * map them into contiguous KVA region and populate them
  * in one single dmu_read() call.
  */
 static int
 mappedread_sf(vnode_t *vp, int nbytes, uio_t *uio)
 {
 	znode_t *zp = VTOZ(vp);
 	objset_t *os = zp->z_zfsvfs->z_os;
 	struct sf_buf *sf;
 	vm_object_t obj;
 	vm_page_t pp;
 	int64_t start;
 	caddr_t va;
 	int len = nbytes;
 	int off;
 	int error = 0;
 
 	ASSERT(uio->uio_segflg == UIO_NOCOPY);
 	ASSERT(vp->v_mount != NULL);
 	obj = vp->v_object;
 	ASSERT(obj != NULL);
 	ASSERT((uio->uio_loffset & PAGEOFFSET) == 0);
 
 	zfs_vmobject_wlock(obj);
 	for (start = uio->uio_loffset; len > 0; start += PAGESIZE) {
 		int bytes = MIN(PAGESIZE, len);
 
 		pp = vm_page_grab(obj, OFF_TO_IDX(start), VM_ALLOC_SBUSY |
 		    VM_ALLOC_NORMAL | VM_ALLOC_IGN_SBUSY);
 		if (pp->valid == 0) {
 			zfs_vmobject_wunlock(obj);
 			va = zfs_map_page(pp, &sf);
 			error = dmu_read(os, zp->z_id, start, bytes, va,
 			    DMU_READ_PREFETCH);
 			if (bytes != PAGESIZE && error == 0)
 				bzero(va + bytes, PAGESIZE - bytes);
 			zfs_unmap_page(sf);
 			zfs_vmobject_wlock(obj);
 			vm_page_sunbusy(pp);
 			vm_page_lock(pp);
 			if (error) {
 				if (pp->wire_count == 0 && pp->valid == 0 &&
 				    !vm_page_busied(pp))
 					vm_page_free(pp);
 			} else {
 				pp->valid = VM_PAGE_BITS_ALL;
 				vm_page_activate(pp);
 			}
 			vm_page_unlock(pp);
 		} else {
 			ASSERT3U(pp->valid, ==, VM_PAGE_BITS_ALL);
 			vm_page_sunbusy(pp);
 		}
 		if (error)
 			break;
 		uio->uio_resid -= bytes;
 		uio->uio_offset += bytes;
 		len -= bytes;
 	}
 	zfs_vmobject_wunlock(obj);
 	return (error);
 }
 
 /*
  * When a file is memory mapped, we must keep the IO data synchronized
  * between the DMU cache and the memory mapped pages.  What this means:
  *
  * On Read:	We "read" preferentially from memory mapped pages,
  *		else we default from the dmu buffer.
  *
  * NOTE: We will always "break up" the IO into PAGESIZE uiomoves when
  *	 the file is memory mapped.
  */
 static int
 mappedread(vnode_t *vp, int nbytes, uio_t *uio)
 {
 	znode_t *zp = VTOZ(vp);
 	vm_object_t obj;
 	int64_t start;
 	caddr_t va;
 	int len = nbytes;
 	int off;
 	int error = 0;
 
 	ASSERT(vp->v_mount != NULL);
 	obj = vp->v_object;
 	ASSERT(obj != NULL);
 
 	start = uio->uio_loffset;
 	off = start & PAGEOFFSET;
 	zfs_vmobject_wlock(obj);
 	for (start &= PAGEMASK; len > 0; start += PAGESIZE) {
 		vm_page_t pp;
 		uint64_t bytes = MIN(PAGESIZE - off, len);
 
 		if (pp = page_hold(vp, start)) {
 			struct sf_buf *sf;
 			caddr_t va;
 
 			zfs_vmobject_wunlock(obj);
 			va = zfs_map_page(pp, &sf);
 #ifdef illumos
 			error = uiomove(va + off, bytes, UIO_READ, uio);
 #else
 			error = vn_io_fault_uiomove(va + off, bytes, uio);
 #endif
 			zfs_unmap_page(sf);
 			zfs_vmobject_wlock(obj);
 			page_unhold(pp);
 		} else {
 			zfs_vmobject_wunlock(obj);
 			error = dmu_read_uio_dbuf(sa_get_db(zp->z_sa_hdl),
 			    uio, bytes);
 			zfs_vmobject_wlock(obj);
 		}
 		len -= bytes;
 		off = 0;
 		if (error)
 			break;
 	}
 	zfs_vmobject_wunlock(obj);
 	return (error);
 }
 
 offset_t zfs_read_chunk_size = 1024 * 1024; /* Tunable */
 
 /*
  * Read bytes from specified file into supplied buffer.
  *
  *	IN:	vp	- vnode of file to be read from.
  *		uio	- structure supplying read location, range info,
  *			  and return buffer.
  *		ioflag	- SYNC flags; used to provide FRSYNC semantics.
  *		cr	- credentials of caller.
  *		ct	- caller context
  *
  *	OUT:	uio	- updated offset and range, buffer filled.
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Side Effects:
  *	vp - atime updated if byte count > 0
  */
 /* ARGSUSED */
 static int
 zfs_read(vnode_t *vp, uio_t *uio, int ioflag, cred_t *cr, caller_context_t *ct)
 {
 	znode_t		*zp = VTOZ(vp);
 	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
 	ssize_t		n, nbytes;
 	int		error = 0;
 	rl_t		*rl;
 	xuio_t		*xuio = NULL;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	if (zp->z_pflags & ZFS_AV_QUARANTINED) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EACCES));
 	}
 
 	/*
 	 * Validate file offset
 	 */
 	if (uio->uio_loffset < (offset_t)0) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EINVAL));
 	}
 
 	/*
 	 * Fasttrack empty reads
 	 */
 	if (uio->uio_resid == 0) {
 		ZFS_EXIT(zfsvfs);
 		return (0);
 	}
 
 	/*
 	 * Check for mandatory locks
 	 */
 	if (MANDMODE(zp->z_mode)) {
 		if (error = chklock(vp, FREAD,
 		    uio->uio_loffset, uio->uio_resid, uio->uio_fmode, ct)) {
 			ZFS_EXIT(zfsvfs);
 			return (error);
 		}
 	}
 
 	/*
 	 * If we're in FRSYNC mode, sync out this znode before reading it.
 	 */
 	if (zfsvfs->z_log &&
 	    (ioflag & FRSYNC || zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS))
 		zil_commit(zfsvfs->z_log, zp->z_id);
 
 	/*
 	 * Lock the range against changes.
 	 */
 	rl = zfs_range_lock(zp, uio->uio_loffset, uio->uio_resid, RL_READER);
 
 	/*
 	 * If we are reading past end-of-file we can skip
 	 * to the end; but we might still need to set atime.
 	 */
 	if (uio->uio_loffset >= zp->z_size) {
 		error = 0;
 		goto out;
 	}
 
 	ASSERT(uio->uio_loffset < zp->z_size);
 	n = MIN(uio->uio_resid, zp->z_size - uio->uio_loffset);
 
 #ifdef illumos
 	if ((uio->uio_extflg == UIO_XUIO) &&
 	    (((xuio_t *)uio)->xu_type == UIOTYPE_ZEROCOPY)) {
 		int nblk;
 		int blksz = zp->z_blksz;
 		uint64_t offset = uio->uio_loffset;
 
 		xuio = (xuio_t *)uio;
 		if ((ISP2(blksz))) {
 			nblk = (P2ROUNDUP(offset + n, blksz) - P2ALIGN(offset,
 			    blksz)) / blksz;
 		} else {
 			ASSERT(offset + n <= blksz);
 			nblk = 1;
 		}
 		(void) dmu_xuio_init(xuio, nblk);
 
 		if (vn_has_cached_data(vp)) {
 			/*
 			 * For simplicity, we always allocate a full buffer
 			 * even if we only expect to read a portion of a block.
 			 */
 			while (--nblk >= 0) {
 				(void) dmu_xuio_add(xuio,
 				    dmu_request_arcbuf(sa_get_db(zp->z_sa_hdl),
 				    blksz), 0, blksz);
 			}
 		}
 	}
 #endif	/* illumos */
 
 	while (n > 0) {
 		nbytes = MIN(n, zfs_read_chunk_size -
 		    P2PHASE(uio->uio_loffset, zfs_read_chunk_size));
 
 #ifdef __FreeBSD__
 		if (uio->uio_segflg == UIO_NOCOPY)
 			error = mappedread_sf(vp, nbytes, uio);
 		else
 #endif /* __FreeBSD__ */
 		if (vn_has_cached_data(vp)) {
 			error = mappedread(vp, nbytes, uio);
 		} else {
 			error = dmu_read_uio_dbuf(sa_get_db(zp->z_sa_hdl),
 			    uio, nbytes);
 		}
 		if (error) {
 			/* convert checksum errors into IO errors */
 			if (error == ECKSUM)
 				error = SET_ERROR(EIO);
 			break;
 		}
 
 		n -= nbytes;
 	}
 out:
 	zfs_range_unlock(rl);
 
 	ZFS_ACCESSTIME_STAMP(zfsvfs, zp);
 	ZFS_EXIT(zfsvfs);
 	return (error);
 }
 
 /*
  * Write the bytes to a file.
  *
  *	IN:	vp	- vnode of file to be written to.
  *		uio	- structure supplying write location, range info,
  *			  and data buffer.
  *		ioflag	- FAPPEND, FSYNC, and/or FDSYNC.  FAPPEND is
  *			  set if in append mode.
  *		cr	- credentials of caller.
  *		ct	- caller context (NFS/CIFS fem monitor only)
  *
  *	OUT:	uio	- updated offset and range.
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Timestamps:
  *	vp - ctime|mtime updated if byte count > 0
  */
 
 /* ARGSUSED */
 static int
 zfs_write(vnode_t *vp, uio_t *uio, int ioflag, cred_t *cr, caller_context_t *ct)
 {
 	znode_t		*zp = VTOZ(vp);
 	rlim64_t	limit = MAXOFFSET_T;
 	ssize_t		start_resid = uio->uio_resid;
 	ssize_t		tx_bytes;
 	uint64_t	end_size;
 	dmu_tx_t	*tx;
 	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
 	zilog_t		*zilog;
 	offset_t	woff;
 	ssize_t		n, nbytes;
 	rl_t		*rl;
 	int		max_blksz = zfsvfs->z_max_blksz;
 	int		error = 0;
 	arc_buf_t	*abuf;
 	iovec_t		*aiov = NULL;
 	xuio_t		*xuio = NULL;
 	int		i_iov = 0;
 	int		iovcnt = uio->uio_iovcnt;
 	iovec_t		*iovp = uio->uio_iov;
 	int		write_eof;
 	int		count = 0;
 	sa_bulk_attr_t	bulk[4];
 	uint64_t	mtime[2], ctime[2];
 
 	/*
 	 * Fasttrack empty write
 	 */
 	n = start_resid;
 	if (n == 0)
 		return (0);
 
 	if (limit == RLIM64_INFINITY || limit > MAXOFFSET_T)
 		limit = MAXOFFSET_T;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL, &mtime, 16);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, &ctime, 16);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_SIZE(zfsvfs), NULL,
 	    &zp->z_size, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL,
 	    &zp->z_pflags, 8);
 
 	/*
 	 * In a case vp->v_vfsp != zp->z_zfsvfs->z_vfs (e.g. snapshots) our
 	 * callers might not be able to detect properly that we are read-only,
 	 * so check it explicitly here.
 	 */
 	if (zfsvfs->z_vfs->vfs_flag & VFS_RDONLY) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EROFS));
 	}
 
 	/*
 	 * If immutable or not appending then return EPERM
 	 */
 	if ((zp->z_pflags & (ZFS_IMMUTABLE | ZFS_READONLY)) ||
 	    ((zp->z_pflags & ZFS_APPENDONLY) && !(ioflag & FAPPEND) &&
 	    (uio->uio_loffset < zp->z_size))) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EPERM));
 	}
 
 	zilog = zfsvfs->z_log;
 
 	/*
 	 * Validate file offset
 	 */
 	woff = ioflag & FAPPEND ? zp->z_size : uio->uio_loffset;
 	if (woff < 0) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EINVAL));
 	}
 
 	/*
 	 * Check for mandatory locks before calling zfs_range_lock()
 	 * in order to prevent a deadlock with locks set via fcntl().
 	 */
 	if (MANDMODE((mode_t)zp->z_mode) &&
 	    (error = chklock(vp, FWRITE, woff, n, uio->uio_fmode, ct)) != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 #ifdef illumos
 	/*
 	 * Pre-fault the pages to ensure slow (eg NFS) pages
 	 * don't hold up txg.
 	 * Skip this if uio contains loaned arc_buf.
 	 */
 	if ((uio->uio_extflg == UIO_XUIO) &&
 	    (((xuio_t *)uio)->xu_type == UIOTYPE_ZEROCOPY))
 		xuio = (xuio_t *)uio;
 	else
 		uio_prefaultpages(MIN(n, max_blksz), uio);
 #endif
 
 	/*
 	 * If in append mode, set the io offset pointer to eof.
 	 */
 	if (ioflag & FAPPEND) {
 		/*
 		 * Obtain an appending range lock to guarantee file append
 		 * semantics.  We reset the write offset once we have the lock.
 		 */
 		rl = zfs_range_lock(zp, 0, n, RL_APPEND);
 		woff = rl->r_off;
 		if (rl->r_len == UINT64_MAX) {
 			/*
 			 * We overlocked the file because this write will cause
 			 * the file block size to increase.
 			 * Note that zp_size cannot change with this lock held.
 			 */
 			woff = zp->z_size;
 		}
 		uio->uio_loffset = woff;
 	} else {
 		/*
 		 * Note that if the file block size will change as a result of
 		 * this write, then this range lock will lock the entire file
 		 * so that we can re-write the block safely.
 		 */
 		rl = zfs_range_lock(zp, woff, n, RL_WRITER);
 	}
 
 	if (vn_rlimit_fsize(vp, uio, uio->uio_td)) {
 		zfs_range_unlock(rl);
 		ZFS_EXIT(zfsvfs);
 		return (EFBIG);
 	}
 
 	if (woff >= limit) {
 		zfs_range_unlock(rl);
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EFBIG));
 	}
 
 	if ((woff + n) > limit || woff > (limit - n))
 		n = limit - woff;
 
 	/* Will this write extend the file length? */
 	write_eof = (woff + n > zp->z_size);
 
 	end_size = MAX(zp->z_size, woff + n);
 
 	/*
 	 * Write the file in reasonable size chunks.  Each chunk is written
 	 * in a separate transaction; this keeps the intent log records small
 	 * and allows us to do more fine-grained space accounting.
 	 */
 	while (n > 0) {
 		abuf = NULL;
 		woff = uio->uio_loffset;
 		if (zfs_owner_overquota(zfsvfs, zp, B_FALSE) ||
 		    zfs_owner_overquota(zfsvfs, zp, B_TRUE)) {
 			if (abuf != NULL)
 				dmu_return_arcbuf(abuf);
 			error = SET_ERROR(EDQUOT);
 			break;
 		}
 
 		if (xuio && abuf == NULL) {
 			ASSERT(i_iov < iovcnt);
 			aiov = &iovp[i_iov];
 			abuf = dmu_xuio_arcbuf(xuio, i_iov);
 			dmu_xuio_clear(xuio, i_iov);
 			DTRACE_PROBE3(zfs_cp_write, int, i_iov,
 			    iovec_t *, aiov, arc_buf_t *, abuf);
 			ASSERT((aiov->iov_base == abuf->b_data) ||
 			    ((char *)aiov->iov_base - (char *)abuf->b_data +
 			    aiov->iov_len == arc_buf_size(abuf)));
 			i_iov++;
 		} else if (abuf == NULL && n >= max_blksz &&
 		    woff >= zp->z_size &&
 		    P2PHASE(woff, max_blksz) == 0 &&
 		    zp->z_blksz == max_blksz) {
 			/*
 			 * This write covers a full block.  "Borrow" a buffer
 			 * from the dmu so that we can fill it before we enter
 			 * a transaction.  This avoids the possibility of
 			 * holding up the transaction if the data copy hangs
 			 * up on a pagefault (e.g., from an NFS server mapping).
 			 */
 #ifdef illumos
 			size_t cbytes;
 #endif
 
 			abuf = dmu_request_arcbuf(sa_get_db(zp->z_sa_hdl),
 			    max_blksz);
 			ASSERT(abuf != NULL);
 			ASSERT(arc_buf_size(abuf) == max_blksz);
 #ifdef illumos
 			if (error = uiocopy(abuf->b_data, max_blksz,
 			    UIO_WRITE, uio, &cbytes)) {
 				dmu_return_arcbuf(abuf);
 				break;
 			}
 			ASSERT(cbytes == max_blksz);
 #else
 			ssize_t resid = uio->uio_resid;
 			error = vn_io_fault_uiomove(abuf->b_data, max_blksz, uio);
 			if (error != 0) {
 				uio->uio_offset -= resid - uio->uio_resid;
 				uio->uio_resid = resid;
 				dmu_return_arcbuf(abuf);
 				break;
 			}
 #endif
 		}
 
 		/*
 		 * Start a transaction.
 		 */
 		tx = dmu_tx_create(zfsvfs->z_os);
 		dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
 		dmu_tx_hold_write(tx, zp->z_id, woff, MIN(n, max_blksz));
 		zfs_sa_upgrade_txholds(tx, zp);
 		error = dmu_tx_assign(tx, TXG_WAIT);
 		if (error) {
 			dmu_tx_abort(tx);
 			if (abuf != NULL)
 				dmu_return_arcbuf(abuf);
 			break;
 		}
 
 		/*
 		 * If zfs_range_lock() over-locked we grow the blocksize
 		 * and then reduce the lock range.  This will only happen
 		 * on the first iteration since zfs_range_reduce() will
 		 * shrink down r_len to the appropriate size.
 		 */
 		if (rl->r_len == UINT64_MAX) {
 			uint64_t new_blksz;
 
 			if (zp->z_blksz > max_blksz) {
 				/*
 				 * File's blocksize is already larger than the
 				 * "recordsize" property.  Only let it grow to
 				 * the next power of 2.
 				 */
 				ASSERT(!ISP2(zp->z_blksz));
 				new_blksz = MIN(end_size,
 				    1 << highbit64(zp->z_blksz));
 			} else {
 				new_blksz = MIN(end_size, max_blksz);
 			}
 			zfs_grow_blocksize(zp, new_blksz, tx);
 			zfs_range_reduce(rl, woff, n);
 		}
 
 		/*
 		 * XXX - should we really limit each write to z_max_blksz?
 		 * Perhaps we should use SPA_MAXBLOCKSIZE chunks?
 		 */
 		nbytes = MIN(n, max_blksz - P2PHASE(woff, max_blksz));
 
 		if (woff + nbytes > zp->z_size)
 			vnode_pager_setsize(vp, woff + nbytes);
 
 		if (abuf == NULL) {
 			tx_bytes = uio->uio_resid;
 			error = dmu_write_uio_dbuf(sa_get_db(zp->z_sa_hdl),
 			    uio, nbytes, tx);
 			tx_bytes -= uio->uio_resid;
 		} else {
 			tx_bytes = nbytes;
 			ASSERT(xuio == NULL || tx_bytes == aiov->iov_len);
 			/*
 			 * If this is not a full block write, but we are
 			 * extending the file past EOF and this data starts
 			 * block-aligned, use assign_arcbuf().  Otherwise,
 			 * write via dmu_write().
 			 */
 			if (tx_bytes < max_blksz && (!write_eof ||
 			    aiov->iov_base != abuf->b_data)) {
 				ASSERT(xuio);
 				dmu_write(zfsvfs->z_os, zp->z_id, woff,
 				    aiov->iov_len, aiov->iov_base, tx);
 				dmu_return_arcbuf(abuf);
 				xuio_stat_wbuf_copied();
 			} else {
 				ASSERT(xuio || tx_bytes == max_blksz);
 				dmu_assign_arcbuf(sa_get_db(zp->z_sa_hdl),
 				    woff, abuf, tx);
 			}
 #ifdef illumos
 			ASSERT(tx_bytes <= uio->uio_resid);
 			uioskip(uio, tx_bytes);
 #endif
 		}
 		if (tx_bytes && vn_has_cached_data(vp)) {
 			update_pages(vp, woff, tx_bytes, zfsvfs->z_os,
 			    zp->z_id, uio->uio_segflg, tx);
 		}
 
 		/*
 		 * If we made no progress, we're done.  If we made even
 		 * partial progress, update the znode and ZIL accordingly.
 		 */
 		if (tx_bytes == 0) {
 			(void) sa_update(zp->z_sa_hdl, SA_ZPL_SIZE(zfsvfs),
 			    (void *)&zp->z_size, sizeof (uint64_t), tx);
 			dmu_tx_commit(tx);
 			ASSERT(error != 0);
 			break;
 		}
 
 		/*
 		 * Clear Set-UID/Set-GID bits on successful write if not
 		 * privileged and at least one of the excute bits is set.
 		 *
 		 * It would be nice to to this after all writes have
 		 * been done, but that would still expose the ISUID/ISGID
 		 * to another app after the partial write is committed.
 		 *
 		 * Note: we don't call zfs_fuid_map_id() here because
 		 * user 0 is not an ephemeral uid.
 		 */
 		mutex_enter(&zp->z_acl_lock);
 		if ((zp->z_mode & (S_IXUSR | (S_IXUSR >> 3) |
 		    (S_IXUSR >> 6))) != 0 &&
 		    (zp->z_mode & (S_ISUID | S_ISGID)) != 0 &&
 		    secpolicy_vnode_setid_retain(vp, cr,
 		    (zp->z_mode & S_ISUID) != 0 && zp->z_uid == 0) != 0) {
 			uint64_t newmode;
 			zp->z_mode &= ~(S_ISUID | S_ISGID);
 			newmode = zp->z_mode;
 			(void) sa_update(zp->z_sa_hdl, SA_ZPL_MODE(zfsvfs),
 			    (void *)&newmode, sizeof (uint64_t), tx);
 		}
 		mutex_exit(&zp->z_acl_lock);
 
 		zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime,
 		    B_TRUE);
 
 		/*
 		 * Update the file size (zp_size) if it has changed;
 		 * account for possible concurrent updates.
 		 */
 		while ((end_size = zp->z_size) < uio->uio_loffset) {
 			(void) atomic_cas_64(&zp->z_size, end_size,
 			    uio->uio_loffset);
 #ifdef illumos
 			ASSERT(error == 0);
 #else
 			ASSERT(error == 0 || error == EFAULT);
 #endif
 		}
 		/*
 		 * If we are replaying and eof is non zero then force
 		 * the file size to the specified eof. Note, there's no
 		 * concurrency during replay.
 		 */
 		if (zfsvfs->z_replay && zfsvfs->z_replay_eof != 0)
 			zp->z_size = zfsvfs->z_replay_eof;
 
 		if (error == 0)
 			error = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx);
 		else
 			(void) sa_bulk_update(zp->z_sa_hdl, bulk, count, tx);
 
 		zfs_log_write(zilog, tx, TX_WRITE, zp, woff, tx_bytes, ioflag);
 		dmu_tx_commit(tx);
 
 		if (error != 0)
 			break;
 		ASSERT(tx_bytes == nbytes);
 		n -= nbytes;
 
 #ifdef illumos
 		if (!xuio && n > 0)
 			uio_prefaultpages(MIN(n, max_blksz), uio);
 #endif
 	}
 
 	zfs_range_unlock(rl);
 
 	/*
 	 * If we're in replay mode, or we made no progress, return error.
 	 * Otherwise, it's at least a partial write, so it's successful.
 	 */
 	if (zfsvfs->z_replay || uio->uio_resid == start_resid) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 #ifdef __FreeBSD__
 	/*
 	 * EFAULT means that at least one page of the source buffer was not
 	 * available.  VFS will re-try remaining I/O upon this error.
 	 */
 	if (error == EFAULT) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 #endif
 
 	if (ioflag & (FSYNC | FDSYNC) ||
 	    zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)
 		zil_commit(zilog, zp->z_id);
 
 	ZFS_EXIT(zfsvfs);
 	return (0);
 }
 
 void
 zfs_get_done(zgd_t *zgd, int error)
 {
 	znode_t *zp = zgd->zgd_private;
 	objset_t *os = zp->z_zfsvfs->z_os;
 
 	if (zgd->zgd_db)
 		dmu_buf_rele(zgd->zgd_db, zgd);
 
 	zfs_range_unlock(zgd->zgd_rl);
 
 	/*
 	 * Release the vnode asynchronously as we currently have the
 	 * txg stopped from syncing.
 	 */
 	VN_RELE_ASYNC(ZTOV(zp), dsl_pool_vnrele_taskq(dmu_objset_pool(os)));
 
 	if (error == 0 && zgd->zgd_bp)
 		zil_add_block(zgd->zgd_zilog, zgd->zgd_bp);
 
 	kmem_free(zgd, sizeof (zgd_t));
 }
 
 #ifdef DEBUG
 static int zil_fault_io = 0;
 #endif
 
 /*
  * Get data to generate a TX_WRITE intent log record.
  */
 int
 zfs_get_data(void *arg, lr_write_t *lr, char *buf, zio_t *zio)
 {
 	zfsvfs_t *zfsvfs = arg;
 	objset_t *os = zfsvfs->z_os;
 	znode_t *zp;
 	uint64_t object = lr->lr_foid;
 	uint64_t offset = lr->lr_offset;
 	uint64_t size = lr->lr_length;
 	blkptr_t *bp = &lr->lr_blkptr;
 	dmu_buf_t *db;
 	zgd_t *zgd;
 	int error = 0;
 
 	ASSERT(zio != NULL);
 	ASSERT(size != 0);
 
 	/*
 	 * Nothing to do if the file has been removed
 	 */
 	if (zfs_zget(zfsvfs, object, &zp) != 0)
 		return (SET_ERROR(ENOENT));
 	if (zp->z_unlinked) {
 		/*
 		 * Release the vnode asynchronously as we currently have the
 		 * txg stopped from syncing.
 		 */
 		VN_RELE_ASYNC(ZTOV(zp),
 		    dsl_pool_vnrele_taskq(dmu_objset_pool(os)));
 		return (SET_ERROR(ENOENT));
 	}
 
 	zgd = (zgd_t *)kmem_zalloc(sizeof (zgd_t), KM_SLEEP);
 	zgd->zgd_zilog = zfsvfs->z_log;
 	zgd->zgd_private = zp;
 
 	/*
 	 * Write records come in two flavors: immediate and indirect.
 	 * For small writes it's cheaper to store the data with the
 	 * log record (immediate); for large writes it's cheaper to
 	 * sync the data and get a pointer to it (indirect) so that
 	 * we don't have to write the data twice.
 	 */
 	if (buf != NULL) { /* immediate write */
 		zgd->zgd_rl = zfs_range_lock(zp, offset, size, RL_READER);
 		/* test for truncation needs to be done while range locked */
 		if (offset >= zp->z_size) {
 			error = SET_ERROR(ENOENT);
 		} else {
 			error = dmu_read(os, object, offset, size, buf,
 			    DMU_READ_NO_PREFETCH);
 		}
 		ASSERT(error == 0 || error == ENOENT);
 	} else { /* indirect write */
 		/*
 		 * Have to lock the whole block to ensure when it's
 		 * written out and it's checksum is being calculated
 		 * that no one can change the data. We need to re-check
 		 * blocksize after we get the lock in case it's changed!
 		 */
 		for (;;) {
 			uint64_t blkoff;
 			size = zp->z_blksz;
 			blkoff = ISP2(size) ? P2PHASE(offset, size) : offset;
 			offset -= blkoff;
 			zgd->zgd_rl = zfs_range_lock(zp, offset, size,
 			    RL_READER);
 			if (zp->z_blksz == size)
 				break;
 			offset += blkoff;
 			zfs_range_unlock(zgd->zgd_rl);
 		}
 		/* test for truncation needs to be done while range locked */
 		if (lr->lr_offset >= zp->z_size)
 			error = SET_ERROR(ENOENT);
 #ifdef DEBUG
 		if (zil_fault_io) {
 			error = SET_ERROR(EIO);
 			zil_fault_io = 0;
 		}
 #endif
 		if (error == 0)
 			error = dmu_buf_hold(os, object, offset, zgd, &db,
 			    DMU_READ_NO_PREFETCH);
 
 		if (error == 0) {
 			blkptr_t *obp = dmu_buf_get_blkptr(db);
 			if (obp) {
 				ASSERT(BP_IS_HOLE(bp));
 				*bp = *obp;
 			}
 
 			zgd->zgd_db = db;
 			zgd->zgd_bp = bp;
 
 			ASSERT(db->db_offset == offset);
 			ASSERT(db->db_size == size);
 
 			error = dmu_sync(zio, lr->lr_common.lrc_txg,
 			    zfs_get_done, zgd);
 			ASSERT(error || lr->lr_length <= zp->z_blksz);
 
 			/*
 			 * On success, we need to wait for the write I/O
 			 * initiated by dmu_sync() to complete before we can
 			 * release this dbuf.  We will finish everything up
 			 * in the zfs_get_done() callback.
 			 */
 			if (error == 0)
 				return (0);
 
 			if (error == EALREADY) {
 				lr->lr_common.lrc_txtype = TX_WRITE2;
 				error = 0;
 			}
 		}
 	}
 
 	zfs_get_done(zgd, error);
 
 	return (error);
 }
 
 /*ARGSUSED*/
 static int
 zfs_access(vnode_t *vp, int mode, int flag, cred_t *cr,
     caller_context_t *ct)
 {
 	znode_t *zp = VTOZ(vp);
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	int error;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	if (flag & V_ACE_MASK)
 		error = zfs_zaccess(zp, mode, flag, B_FALSE, cr);
 	else
 		error = zfs_zaccess_rwx(zp, mode, flag, cr);
 
 	ZFS_EXIT(zfsvfs);
 	return (error);
 }
 
-/*
- * If vnode is for a device return a specfs vnode instead.
- */
 static int
-specvp_check(vnode_t **vpp, cred_t *cr)
+zfs_dd_callback(struct mount *mp, void *arg, int lkflags, struct vnode **vpp)
 {
-	int error = 0;
+	int error;
 
-	if (IS_DEVVP(*vpp)) {
-		struct vnode *svp;
-
-		svp = specvp(*vpp, (*vpp)->v_rdev, (*vpp)->v_type, cr);
-		VN_RELE(*vpp);
-		if (svp == NULL)
-			error = SET_ERROR(ENOSYS);
-		*vpp = svp;
-	}
+	*vpp = arg;
+	error = vn_lock(*vpp, lkflags);
+	if (error != 0)
+		vrele(*vpp);
 	return (error);
 }
 
+static int
+zfs_lookup_lock(vnode_t *dvp, vnode_t *vp, const char *name, int lkflags)
+{
+	znode_t *zdp = VTOZ(dvp);
+	zfsvfs_t *zfsvfs = zdp->z_zfsvfs;
+	int error;
+	int ltype;
 
+	ASSERT_VOP_LOCKED(dvp, __func__);
+#ifdef DIAGNOSTIC
+	ASSERT(!RRM_LOCK_HELD(&zfsvfs->z_teardown_lock));
+#endif
+
+	if (name[0] == 0 || (name[0] == '.' && name[1] == 0)) {
+		ASSERT3P(dvp, ==, vp);
+		vref(dvp);
+		ltype = lkflags & LK_TYPE_MASK;
+		if (ltype != VOP_ISLOCKED(dvp)) {
+			if (ltype == LK_EXCLUSIVE)
+				vn_lock(dvp, LK_UPGRADE | LK_RETRY);
+			else /* if (ltype == LK_SHARED) */
+				vn_lock(dvp, LK_DOWNGRADE | LK_RETRY);
+
+			/*
+			 * Relock for the "." case could leave us with
+			 * reclaimed vnode.
+			 */
+			if (dvp->v_iflag & VI_DOOMED) {
+				vrele(dvp);
+				return (SET_ERROR(ENOENT));
+			}
+		}
+		return (0);
+	} else if (name[0] == '.' && name[1] == '.' && name[2] == 0) {
+		/*
+		 * Note that in this case, dvp is the child vnode, and we
+		 * are looking up the parent vnode - exactly reverse from
+		 * normal operation.  Unlocking dvp requires some rather
+		 * tricky unlock/relock dance to prevent mp from being freed;
+		 * use vn_vget_ino_gen() which takes care of all that.
+		 *
+		 * XXX Note that there is a time window when both vnodes are
+		 * unlocked.  It is possible, although highly unlikely, that
+		 * during that window the parent-child relationship between
+		 * the vnodes may change, for example, get reversed.
+		 * In that case we would have a wrong lock order for the vnodes.
+		 * All other filesystems seem to ignore this problem, so we
+		 * do the same here.
+		 * A potential solution could be implemented as follows:
+		 * - using LK_NOWAIT when locking the second vnode and retrying
+		 *   if necessary
+		 * - checking that the parent-child relationship still holds
+		 *   after locking both vnodes and retrying if it doesn't
+		 */
+		error = vn_vget_ino_gen(dvp, zfs_dd_callback, vp, lkflags, &vp);
+		return (error);
+	} else {
+		error = vn_lock(vp, lkflags);
+		if (error != 0)
+			vrele(vp);
+		return (error);
+	}
+}
+
 /*
  * Lookup an entry in a directory, or an extended attribute directory.
  * If it exists, return a held vnode reference for it.
  *
  *	IN:	dvp	- vnode of directory to search.
  *		nm	- name of entry to lookup.
  *		pnp	- full pathname to lookup [UNUSED].
  *		flags	- LOOKUP_XATTR set if looking for an attribute.
  *		rdir	- root directory vnode [UNUSED].
  *		cr	- credentials of caller.
  *		ct	- caller context
- *		direntflags - directory lookup flags
- *		realpnp - returned pathname.
  *
  *	OUT:	vpp	- vnode of located entry, NULL if not found.
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Timestamps:
  *	NA
  */
 /* ARGSUSED */
 static int
 zfs_lookup(vnode_t *dvp, char *nm, vnode_t **vpp, struct componentname *cnp,
     int nameiop, cred_t *cr, kthread_t *td, int flags)
 {
 	znode_t *zdp = VTOZ(dvp);
+	znode_t *zp;
 	zfsvfs_t *zfsvfs = zdp->z_zfsvfs;
 	int	error = 0;
-	int *direntflags = NULL;
-	void *realpnp = NULL;
 
-	/* fast path */
-	if (!(flags & (LOOKUP_XATTR | FIGNORECASE))) {
-
+	/* fast path (should be redundant with vfs namecache) */
+	if (!(flags & LOOKUP_XATTR)) {
 		if (dvp->v_type != VDIR) {
 			return (SET_ERROR(ENOTDIR));
 		} else if (zdp->z_sa_hdl == NULL) {
 			return (SET_ERROR(EIO));
 		}
-
-		if (nm[0] == 0 || (nm[0] == '.' && nm[1] == '\0')) {
-			error = zfs_fastaccesschk_execute(zdp, cr);
-			if (!error) {
-				*vpp = dvp;
-				VN_HOLD(*vpp);
-				return (0);
-			}
-			return (error);
-		} else {
-			vnode_t *tvp = dnlc_lookup(dvp, nm);
-
-			if (tvp) {
-				error = zfs_fastaccesschk_execute(zdp, cr);
-				if (error) {
-					VN_RELE(tvp);
-					return (error);
-				}
-				if (tvp == DNLC_NO_VNODE) {
-					VN_RELE(tvp);
-					return (SET_ERROR(ENOENT));
-				} else {
-					*vpp = tvp;
-					return (specvp_check(vpp, cr));
-				}
-			}
-		}
 	}
 
 	DTRACE_PROBE2(zfs__fastpath__lookup__miss, vnode_t *, dvp, char *, nm);
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zdp);
 
 	*vpp = NULL;
 
 	if (flags & LOOKUP_XATTR) {
 #ifdef TODO
 		/*
 		 * If the xattr property is off, refuse the lookup request.
 		 */
 		if (!(zfsvfs->z_vfs->vfs_flag & VFS_XATTR)) {
 			ZFS_EXIT(zfsvfs);
 			return (SET_ERROR(EINVAL));
 		}
 #endif
 
 		/*
 		 * We don't allow recursive attributes..
 		 * Maybe someday we will.
 		 */
 		if (zdp->z_pflags & ZFS_XATTR) {
 			ZFS_EXIT(zfsvfs);
 			return (SET_ERROR(EINVAL));
 		}
 
 		if (error = zfs_get_xattrdir(VTOZ(dvp), vpp, cr, flags)) {
 			ZFS_EXIT(zfsvfs);
 			return (error);
 		}
 
 		/*
 		 * Do we have permission to get into attribute directory?
 		 */
-
 		if (error = zfs_zaccess(VTOZ(*vpp), ACE_EXECUTE, 0,
 		    B_FALSE, cr)) {
-			VN_RELE(*vpp);
+			vrele(*vpp);
 			*vpp = NULL;
 		}
 
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
-	if (dvp->v_type != VDIR) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(ENOTDIR));
-	}
-
 	/*
 	 * Check accessibility of directory.
 	 */
-
 	if (error = zfs_zaccess(zdp, ACE_EXECUTE, 0, B_FALSE, cr)) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	if (zfsvfs->z_utf8 && u8_validate(nm, strlen(nm),
 	    NULL, U8_VALIDATE_ENTIRE, &error) < 0) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EILSEQ));
 	}
 
-	error = zfs_dirlook(zdp, nm, vpp, flags, direntflags, realpnp);
-	if (error == 0)
-		error = specvp_check(vpp, cr);
+	/*
+	 * The loop is retry the lookup if the parent-child relationship
+	 * changes during the dot-dot locking complexities.
+	 */
+	for (;;) {
+		uint64_t parent;
 
+		error = zfs_dirlook(zdp, nm, &zp);
+		if (error == 0)
+			*vpp = ZTOV(zp);
+
+		ZFS_EXIT(zfsvfs);
+		if (error != 0)
+			break;
+
+		error = zfs_lookup_lock(dvp, *vpp, nm, cnp->cn_lkflags);
+		if (error != 0) {
+			/*
+			 * If we've got a locking error, then the vnode
+			 * got reclaimed because of a force unmount.
+			 * We never enter doomed vnodes into the name cache.
+			 */
+			*vpp = NULL;
+			return (error);
+		}
+
+		if ((cnp->cn_flags & ISDOTDOT) == 0)
+			break;
+
+		ZFS_ENTER(zfsvfs);
+		if (zdp->z_sa_hdl == NULL) {
+			error = SET_ERROR(EIO);
+		} else {
+			error = sa_lookup(zdp->z_sa_hdl, SA_ZPL_PARENT(zfsvfs),
+			    &parent, sizeof (parent));
+		}
+		if (error != 0) {
+			ZFS_EXIT(zfsvfs);
+			vput(ZTOV(zp));
+			break;
+		}
+		if (zp->z_id == parent) {
+			ZFS_EXIT(zfsvfs);
+			break;
+		}
+		vput(ZTOV(zp));
+	}
+
+	if (error != 0)
+		*vpp = NULL;
+
 	/* Translate errors and add SAVENAME when needed. */
 	if (cnp->cn_flags & ISLASTCN) {
 		switch (nameiop) {
 		case CREATE:
 		case RENAME:
 			if (error == ENOENT) {
 				error = EJUSTRETURN;
 				cnp->cn_flags |= SAVENAME;
 				break;
 			}
 			/* FALLTHROUGH */
 		case DELETE:
 			if (error == 0)
 				cnp->cn_flags |= SAVENAME;
 			break;
 		}
 	}
-	if (error == 0 && (nm[0] != '.' || nm[1] != '\0')) {
-		int ltype = 0;
 
-		if (cnp->cn_flags & ISDOTDOT) {
-			ltype = VOP_ISLOCKED(dvp);
-			VOP_UNLOCK(dvp, 0);
-		}
-		ZFS_EXIT(zfsvfs);
-		error = vn_lock(*vpp, cnp->cn_lkflags);
-		if (cnp->cn_flags & ISDOTDOT)
-			vn_lock(dvp, ltype | LK_RETRY);
-		if (error != 0) {
-			VN_RELE(*vpp);
-			*vpp = NULL;
-			return (error);
-		}
-	} else {
-		ZFS_EXIT(zfsvfs);
-	}
+	/* Insert name into cache (as non-existent) if appropriate. */
+	if (zfsvfs->z_use_namecache &&
+	    error == ENOENT && (cnp->cn_flags & MAKEENTRY) != 0)
+		cache_enter(dvp, NULL, cnp);
 
-#ifdef FREEBSD_NAMECACHE
-	/*
-	 * Insert name into cache (as non-existent) if appropriate.
-	 */
-	if (error == ENOENT && (cnp->cn_flags & MAKEENTRY) != 0)
-		cache_enter(dvp, *vpp, cnp);
-	/*
-	 * Insert name into cache if appropriate.
-	 */
-	if (error == 0 && (cnp->cn_flags & MAKEENTRY)) {
+	/* Insert name into cache if appropriate. */
+	if (zfsvfs->z_use_namecache &&
+	    error == 0 && (cnp->cn_flags & MAKEENTRY)) {
 		if (!(cnp->cn_flags & ISLASTCN) ||
 		    (nameiop != DELETE && nameiop != RENAME)) {
 			cache_enter(dvp, *vpp, cnp);
 		}
 	}
-#endif
 
 	return (error);
 }
 
 /*
  * Attempt to create a new entry in a directory.  If the entry
  * already exists, truncate the file if permissible, else return
  * an error.  Return the vp of the created or trunc'd file.
  *
  *	IN:	dvp	- vnode of directory to put new file entry in.
  *		name	- name of new file entry.
  *		vap	- attributes of new file.
  *		excl	- flag indicating exclusive or non-exclusive mode.
  *		mode	- mode to open file with.
  *		cr	- credentials of caller.
  *		flag	- large file flag [UNUSED].
  *		ct	- caller context
  *		vsecp	- ACL to be set
  *
  *	OUT:	vpp	- vnode of created or trunc'd entry.
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Timestamps:
  *	dvp - ctime|mtime updated if new entry created
  *	 vp - ctime|mtime always, atime if new
  */
 
 /* ARGSUSED */
 static int
 zfs_create(vnode_t *dvp, char *name, vattr_t *vap, int excl, int mode,
     vnode_t **vpp, cred_t *cr, kthread_t *td)
 {
 	znode_t		*zp, *dzp = VTOZ(dvp);
 	zfsvfs_t	*zfsvfs = dzp->z_zfsvfs;
 	zilog_t		*zilog;
 	objset_t	*os;
-	zfs_dirlock_t	*dl;
 	dmu_tx_t	*tx;
 	int		error;
 	ksid_t		*ksid;
 	uid_t		uid;
 	gid_t		gid = crgetgid(cr);
 	zfs_acl_ids_t   acl_ids;
 	boolean_t	fuid_dirtied;
-	boolean_t	have_acl = B_FALSE;
-	boolean_t	waited = B_FALSE;
 	void		*vsecp = NULL;
 	int		flag = 0;
+	uint64_t	txtype;
 
 	/*
 	 * If we have an ephemeral id, ACL, or XVATTR then
 	 * make sure file system is at proper version
 	 */
 
 	ksid = crgetsid(cr, KSID_OWNER);
 	if (ksid)
 		uid = ksid_getid(ksid);
 	else
 		uid = crgetuid(cr);
 
 	if (zfsvfs->z_use_fuids == B_FALSE &&
 	    (vsecp || (vap->va_mask & AT_XVATTR) ||
 	    IS_EPHEMERAL(uid) || IS_EPHEMERAL(gid)))
 		return (SET_ERROR(EINVAL));
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(dzp);
 	os = zfsvfs->z_os;
 	zilog = zfsvfs->z_log;
 
 	if (zfsvfs->z_utf8 && u8_validate(name, strlen(name),
 	    NULL, U8_VALIDATE_ENTIRE, &error) < 0) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EILSEQ));
 	}
 
 	if (vap->va_mask & AT_XVATTR) {
 		if ((error = secpolicy_xvattr(dvp, (xvattr_t *)vap,
 		    crgetuid(cr), cr, vap->va_type)) != 0) {
 			ZFS_EXIT(zfsvfs);
 			return (error);
 		}
 	}
 
-	getnewvnode_reserve(1);
-
-top:
 	*vpp = NULL;
 
 	if ((vap->va_mode & S_ISVTX) && secpolicy_vnode_stky_modify(cr))
 		vap->va_mode &= ~S_ISVTX;
 
-	if (*name == '\0') {
-		/*
-		 * Null component name refers to the directory itself.
-		 */
-		VN_HOLD(dvp);
-		zp = dzp;
-		dl = NULL;
-		error = 0;
-	} else {
-		/* possible VN_HOLD(zp) */
-		int zflg = 0;
+	error = zfs_dirent_lookup(dzp, name, &zp, ZNEW);
+	if (error) {
+		ZFS_EXIT(zfsvfs);
+		return (error);
+	}
+	ASSERT3P(zp, ==, NULL);
 
-		if (flag & FIGNORECASE)
-			zflg |= ZCILOOK;
-
-		error = zfs_dirent_lock(&dl, dzp, name, &zp, zflg,
-		    NULL, NULL);
-		if (error) {
-			if (have_acl)
-				zfs_acl_ids_free(&acl_ids);
-			if (strcmp(name, "..") == 0)
-				error = SET_ERROR(EISDIR);
-			getnewvnode_drop_reserve();
-			ZFS_EXIT(zfsvfs);
-			return (error);
-		}
+	/*
+	 * Create a new file object and update the directory
+	 * to reference it.
+	 */
+	if (error = zfs_zaccess(dzp, ACE_ADD_FILE, 0, B_FALSE, cr)) {
+		goto out;
 	}
 
-	if (zp == NULL) {
-		uint64_t txtype;
+	/*
+	 * We only support the creation of regular files in
+	 * extended attribute directories.
+	 */
 
-		/*
-		 * Create a new file object and update the directory
-		 * to reference it.
-		 */
-		if (error = zfs_zaccess(dzp, ACE_ADD_FILE, 0, B_FALSE, cr)) {
-			if (have_acl)
-				zfs_acl_ids_free(&acl_ids);
-			goto out;
-		}
+	if ((dzp->z_pflags & ZFS_XATTR) &&
+	    (vap->va_type != VREG)) {
+		error = SET_ERROR(EINVAL);
+		goto out;
+	}
 
-		/*
-		 * We only support the creation of regular files in
-		 * extended attribute directories.
-		 */
+	if ((error = zfs_acl_ids_create(dzp, 0, vap,
+	    cr, vsecp, &acl_ids)) != 0)
+		goto out;
 
-		if ((dzp->z_pflags & ZFS_XATTR) &&
-		    (vap->va_type != VREG)) {
-			if (have_acl)
-				zfs_acl_ids_free(&acl_ids);
-			error = SET_ERROR(EINVAL);
-			goto out;
-		}
+	if (zfs_acl_ids_overquota(zfsvfs, &acl_ids)) {
+		zfs_acl_ids_free(&acl_ids);
+		error = SET_ERROR(EDQUOT);
+		goto out;
+	}
 
-		if (!have_acl && (error = zfs_acl_ids_create(dzp, 0, vap,
-		    cr, vsecp, &acl_ids)) != 0)
-			goto out;
-		have_acl = B_TRUE;
+	getnewvnode_reserve(1);
 
-		if (zfs_acl_ids_overquota(zfsvfs, &acl_ids)) {
-			zfs_acl_ids_free(&acl_ids);
-			error = SET_ERROR(EDQUOT);
-			goto out;
-		}
+	tx = dmu_tx_create(os);
 
-		tx = dmu_tx_create(os);
+	dmu_tx_hold_sa_create(tx, acl_ids.z_aclp->z_acl_bytes +
+	    ZFS_SA_BASE_ATTR_SIZE);
 
-		dmu_tx_hold_sa_create(tx, acl_ids.z_aclp->z_acl_bytes +
-		    ZFS_SA_BASE_ATTR_SIZE);
-
-		fuid_dirtied = zfsvfs->z_fuid_dirty;
-		if (fuid_dirtied)
-			zfs_fuid_txhold(zfsvfs, tx);
-		dmu_tx_hold_zap(tx, dzp->z_id, TRUE, name);
-		dmu_tx_hold_sa(tx, dzp->z_sa_hdl, B_FALSE);
-		if (!zfsvfs->z_use_sa &&
-		    acl_ids.z_aclp->z_acl_bytes > ZFS_ACE_SPACE) {
-			dmu_tx_hold_write(tx, DMU_NEW_OBJECT,
-			    0, acl_ids.z_aclp->z_acl_bytes);
-		}
-		error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT);
-		if (error) {
-			zfs_dirent_unlock(dl);
-			if (error == ERESTART) {
-				waited = B_TRUE;
-				dmu_tx_wait(tx);
-				dmu_tx_abort(tx);
-				goto top;
-			}
-			zfs_acl_ids_free(&acl_ids);
-			dmu_tx_abort(tx);
-			getnewvnode_drop_reserve();
-			ZFS_EXIT(zfsvfs);
-			return (error);
-		}
-		zfs_mknode(dzp, vap, tx, cr, 0, &zp, &acl_ids);
-
-		if (fuid_dirtied)
-			zfs_fuid_sync(zfsvfs, tx);
-
-		(void) zfs_link_create(dl, zp, tx, ZNEW);
-		txtype = zfs_log_create_txtype(Z_FILE, vsecp, vap);
-		if (flag & FIGNORECASE)
-			txtype |= TX_CI;
-		zfs_log_create(zilog, tx, txtype, dzp, zp, name,
-		    vsecp, acl_ids.z_fuidp, vap);
+	fuid_dirtied = zfsvfs->z_fuid_dirty;
+	if (fuid_dirtied)
+		zfs_fuid_txhold(zfsvfs, tx);
+	dmu_tx_hold_zap(tx, dzp->z_id, TRUE, name);
+	dmu_tx_hold_sa(tx, dzp->z_sa_hdl, B_FALSE);
+	if (!zfsvfs->z_use_sa &&
+	    acl_ids.z_aclp->z_acl_bytes > ZFS_ACE_SPACE) {
+		dmu_tx_hold_write(tx, DMU_NEW_OBJECT,
+		    0, acl_ids.z_aclp->z_acl_bytes);
+	}
+	error = dmu_tx_assign(tx, TXG_WAIT);
+	if (error) {
 		zfs_acl_ids_free(&acl_ids);
-		dmu_tx_commit(tx);
-	} else {
-		int aflags = (flag & FAPPEND) ? V_APPEND : 0;
+		dmu_tx_abort(tx);
+		getnewvnode_drop_reserve();
+		ZFS_EXIT(zfsvfs);
+		return (error);
+	}
+	zfs_mknode(dzp, vap, tx, cr, 0, &zp, &acl_ids);
 
-		if (have_acl)
-			zfs_acl_ids_free(&acl_ids);
-		have_acl = B_FALSE;
+	if (fuid_dirtied)
+		zfs_fuid_sync(zfsvfs, tx);
 
-		/*
-		 * A directory entry already exists for this name.
-		 */
-		/*
-		 * Can't truncate an existing file if in exclusive mode.
-		 */
-		if (excl == EXCL) {
-			error = SET_ERROR(EEXIST);
-			goto out;
-		}
-		/*
-		 * Can't open a directory for writing.
-		 */
-		if ((ZTOV(zp)->v_type == VDIR) && (mode & S_IWRITE)) {
-			error = SET_ERROR(EISDIR);
-			goto out;
-		}
-		/*
-		 * Verify requested access to file.
-		 */
-		if (mode && (error = zfs_zaccess_rwx(zp, mode, aflags, cr))) {
-			goto out;
-		}
+	(void) zfs_link_create(dzp, name, zp, tx, ZNEW);
+	txtype = zfs_log_create_txtype(Z_FILE, vsecp, vap);
+	zfs_log_create(zilog, tx, txtype, dzp, zp, name,
+	    vsecp, acl_ids.z_fuidp, vap);
+	zfs_acl_ids_free(&acl_ids);
+	dmu_tx_commit(tx);
 
-		mutex_enter(&dzp->z_lock);
-		dzp->z_seq++;
-		mutex_exit(&dzp->z_lock);
-
-		/*
-		 * Truncate regular files if requested.
-		 */
-		if ((ZTOV(zp)->v_type == VREG) &&
-		    (vap->va_mask & AT_SIZE) && (vap->va_size == 0)) {
-			/* we can't hold any locks when calling zfs_freesp() */
-			zfs_dirent_unlock(dl);
-			dl = NULL;
-			error = zfs_freesp(zp, 0, 0, mode, TRUE);
-			if (error == 0) {
-				vnevent_create(ZTOV(zp), ct);
-			}
-		}
-	}
-out:
 	getnewvnode_drop_reserve();
-	if (dl)
-		zfs_dirent_unlock(dl);
 
-	if (error) {
-		if (zp)
-			VN_RELE(ZTOV(zp));
-	} else {
+out:
+	if (error == 0) {
 		*vpp = ZTOV(zp);
-		error = specvp_check(vpp, cr);
 	}
 
 	if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)
 		zil_commit(zilog, 0);
 
 	ZFS_EXIT(zfsvfs);
 	return (error);
 }
 
 /*
  * Remove an entry from a directory.
  *
  *	IN:	dvp	- vnode of directory to remove entry from.
  *		name	- name of entry to remove.
  *		cr	- credentials of caller.
  *		ct	- caller context
  *		flags	- case flags
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Timestamps:
  *	dvp - ctime|mtime
  *	 vp - ctime (if nlink > 0)
  */
 
-uint64_t null_xattr = 0;
-
 /*ARGSUSED*/
 static int
-zfs_remove(vnode_t *dvp, char *name, cred_t *cr, caller_context_t *ct,
-    int flags)
+zfs_remove(vnode_t *dvp, vnode_t *vp, char *name, cred_t *cr)
 {
-	znode_t		*zp, *dzp = VTOZ(dvp);
+	znode_t		*dzp = VTOZ(dvp);
+	znode_t		*zp = VTOZ(vp);
 	znode_t		*xzp;
-	vnode_t		*vp;
 	zfsvfs_t	*zfsvfs = dzp->z_zfsvfs;
 	zilog_t		*zilog;
 	uint64_t	acl_obj, xattr_obj;
-	uint64_t	xattr_obj_unlinked = 0;
 	uint64_t	obj = 0;
-	zfs_dirlock_t	*dl;
 	dmu_tx_t	*tx;
-	boolean_t	may_delete_now, delete_now = FALSE;
 	boolean_t	unlinked, toobig = FALSE;
 	uint64_t	txtype;
-	pathname_t	*realnmp = NULL;
-	pathname_t	realnm;
 	int		error;
-	int		zflg = ZEXISTS;
-	boolean_t	waited = B_FALSE;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(dzp);
+	ZFS_VERIFY_ZP(zp);
 	zilog = zfsvfs->z_log;
+	zp = VTOZ(vp);
 
-	if (flags & FIGNORECASE) {
-		zflg |= ZCILOOK;
-		pn_alloc(&realnm);
-		realnmp = &realnm;
-	}
-
-top:
 	xattr_obj = 0;
 	xzp = NULL;
-	/*
-	 * Attempt to lock directory; fail if entry doesn't exist.
-	 */
-	if (error = zfs_dirent_lock(&dl, dzp, name, &zp, zflg,
-	    NULL, realnmp)) {
-		if (realnmp)
-			pn_free(realnmp);
-		ZFS_EXIT(zfsvfs);
-		return (error);
-	}
 
-	vp = ZTOV(zp);
-
 	if (error = zfs_zaccess_delete(dzp, zp, cr)) {
 		goto out;
 	}
 
 	/*
 	 * Need to use rmdir for removing directories.
 	 */
 	if (vp->v_type == VDIR) {
 		error = SET_ERROR(EPERM);
 		goto out;
 	}
 
 	vnevent_remove(vp, dvp, name, ct);
 
-	if (realnmp)
-		dnlc_remove(dvp, realnmp->pn_buf);
-	else
-		dnlc_remove(dvp, name);
+	obj = zp->z_id;
 
-	VI_LOCK(vp);
-	may_delete_now = vp->v_count == 1 && !vn_has_cached_data(vp);
-	VI_UNLOCK(vp);
+	/* are there any extended attributes? */
+	error = sa_lookup(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs),
+	    &xattr_obj, sizeof (xattr_obj));
+	if (error == 0 && xattr_obj) {
+		error = zfs_zget(zfsvfs, xattr_obj, &xzp);
+		ASSERT0(error);
+	}
 
 	/*
 	 * We may delete the znode now, or we may put it in the unlinked set;
 	 * it depends on whether we're the last link, and on whether there are
 	 * other holds on the vnode.  So we dmu_tx_hold() the right things to
 	 * allow for either case.
 	 */
-	obj = zp->z_id;
 	tx = dmu_tx_create(zfsvfs->z_os);
 	dmu_tx_hold_zap(tx, dzp->z_id, FALSE, name);
 	dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
 	zfs_sa_upgrade_txholds(tx, zp);
 	zfs_sa_upgrade_txholds(tx, dzp);
-	if (may_delete_now) {
-		toobig =
-		    zp->z_size > zp->z_blksz * DMU_MAX_DELETEBLKCNT;
-		/* if the file is too big, only hold_free a token amount */
-		dmu_tx_hold_free(tx, zp->z_id, 0,
-		    (toobig ? DMU_MAX_ACCESS : DMU_OBJECT_END));
-	}
 
-	/* are there any extended attributes? */
-	error = sa_lookup(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs),
-	    &xattr_obj, sizeof (xattr_obj));
-	if (error == 0 && xattr_obj) {
-		error = zfs_zget(zfsvfs, xattr_obj, &xzp);
-		ASSERT0(error);
+	if (xzp) {
 		dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE);
 		dmu_tx_hold_sa(tx, xzp->z_sa_hdl, B_FALSE);
 	}
 
-	mutex_enter(&zp->z_lock);
-	if ((acl_obj = zfs_external_acl(zp)) != 0 && may_delete_now)
-		dmu_tx_hold_free(tx, acl_obj, 0, DMU_OBJECT_END);
-	mutex_exit(&zp->z_lock);
-
 	/* charge as an update -- would be nice not to charge at all */
 	dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, FALSE, NULL);
 
 	/*
 	 * Mark this transaction as typically resulting in a net free of space
 	 */
 	dmu_tx_mark_netfree(tx);
 
-	error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT);
+	error = dmu_tx_assign(tx, TXG_WAIT);
 	if (error) {
-		zfs_dirent_unlock(dl);
-		VN_RELE(vp);
-		if (xzp)
-			VN_RELE(ZTOV(xzp));
-		if (error == ERESTART) {
-			waited = B_TRUE;
-			dmu_tx_wait(tx);
-			dmu_tx_abort(tx);
-			goto top;
-		}
-		if (realnmp)
-			pn_free(realnmp);
 		dmu_tx_abort(tx);
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	/*
 	 * Remove the directory entry.
 	 */
-	error = zfs_link_destroy(dl, zp, tx, zflg, &unlinked);
+	error = zfs_link_destroy(dzp, name, zp, tx, ZEXISTS, &unlinked);
 
 	if (error) {
 		dmu_tx_commit(tx);
 		goto out;
 	}
 
 	if (unlinked) {
-		/*
-		 * Hold z_lock so that we can make sure that the ACL obj
-		 * hasn't changed.  Could have been deleted due to
-		 * zfs_sa_upgrade().
-		 */
-		mutex_enter(&zp->z_lock);
-		VI_LOCK(vp);
-		(void) sa_lookup(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs),
-		    &xattr_obj_unlinked, sizeof (xattr_obj_unlinked));
-		delete_now = may_delete_now && !toobig &&
-		    vp->v_count == 1 && !vn_has_cached_data(vp) &&
-		    xattr_obj == xattr_obj_unlinked && zfs_external_acl(zp) ==
-		    acl_obj;
-		VI_UNLOCK(vp);
-	}
-
-	if (delete_now) {
-#ifdef __FreeBSD__
-		panic("zfs_remove: delete_now branch taken");
-#endif
-		if (xattr_obj_unlinked) {
-			ASSERT3U(xzp->z_links, ==, 2);
-			mutex_enter(&xzp->z_lock);
-			xzp->z_unlinked = 1;
-			xzp->z_links = 0;
-			error = sa_update(xzp->z_sa_hdl, SA_ZPL_LINKS(zfsvfs),
-			    &xzp->z_links, sizeof (xzp->z_links), tx);
-			ASSERT3U(error,  ==,  0);
-			mutex_exit(&xzp->z_lock);
-			zfs_unlinked_add(xzp, tx);
-
-			if (zp->z_is_sa)
-				error = sa_remove(zp->z_sa_hdl,
-				    SA_ZPL_XATTR(zfsvfs), tx);
-			else
-				error = sa_update(zp->z_sa_hdl,
-				    SA_ZPL_XATTR(zfsvfs), &null_xattr,
-				    sizeof (uint64_t), tx);
-			ASSERT0(error);
-		}
-		VI_LOCK(vp);
-		vp->v_count--;
-		ASSERT0(vp->v_count);
-		VI_UNLOCK(vp);
-		mutex_exit(&zp->z_lock);
-		zfs_znode_delete(zp, tx);
-	} else if (unlinked) {
-		mutex_exit(&zp->z_lock);
 		zfs_unlinked_add(zp, tx);
-#ifdef __FreeBSD__
 		vp->v_vflag |= VV_NOSYNC;
-#endif
 	}
 
 	txtype = TX_REMOVE;
-	if (flags & FIGNORECASE)
-		txtype |= TX_CI;
 	zfs_log_remove(zilog, tx, txtype, dzp, name, obj);
 
 	dmu_tx_commit(tx);
 out:
-	if (realnmp)
-		pn_free(realnmp);
 
-	zfs_dirent_unlock(dl);
-
-	if (!delete_now)
-		VN_RELE(vp);
 	if (xzp)
-		VN_RELE(ZTOV(xzp));
+		vrele(ZTOV(xzp));
 
 	if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)
 		zil_commit(zilog, 0);
 
 	ZFS_EXIT(zfsvfs);
 	return (error);
 }
 
 /*
  * Create a new directory and insert it into dvp using the name
  * provided.  Return a pointer to the inserted directory.
  *
  *	IN:	dvp	- vnode of directory to add subdir to.
  *		dirname	- name of new directory.
  *		vap	- attributes of new directory.
  *		cr	- credentials of caller.
  *		ct	- caller context
  *		flags	- case flags
  *		vsecp	- ACL to be set
  *
  *	OUT:	vpp	- vnode of created directory.
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Timestamps:
  *	dvp - ctime|mtime updated
  *	 vp - ctime|mtime|atime updated
  */
 /*ARGSUSED*/
 static int
-zfs_mkdir(vnode_t *dvp, char *dirname, vattr_t *vap, vnode_t **vpp, cred_t *cr,
-    caller_context_t *ct, int flags, vsecattr_t *vsecp)
+zfs_mkdir(vnode_t *dvp, char *dirname, vattr_t *vap, vnode_t **vpp, cred_t *cr)
 {
 	znode_t		*zp, *dzp = VTOZ(dvp);
 	zfsvfs_t	*zfsvfs = dzp->z_zfsvfs;
 	zilog_t		*zilog;
-	zfs_dirlock_t	*dl;
 	uint64_t	txtype;
 	dmu_tx_t	*tx;
 	int		error;
-	int		zf = ZNEW;
 	ksid_t		*ksid;
 	uid_t		uid;
 	gid_t		gid = crgetgid(cr);
 	zfs_acl_ids_t   acl_ids;
 	boolean_t	fuid_dirtied;
-	boolean_t	waited = B_FALSE;
 
 	ASSERT(vap->va_type == VDIR);
 
 	/*
 	 * If we have an ephemeral id, ACL, or XVATTR then
 	 * make sure file system is at proper version
 	 */
 
 	ksid = crgetsid(cr, KSID_OWNER);
 	if (ksid)
 		uid = ksid_getid(ksid);
 	else
 		uid = crgetuid(cr);
 	if (zfsvfs->z_use_fuids == B_FALSE &&
-	    (vsecp || (vap->va_mask & AT_XVATTR) ||
+	    ((vap->va_mask & AT_XVATTR) ||
 	    IS_EPHEMERAL(uid) || IS_EPHEMERAL(gid)))
 		return (SET_ERROR(EINVAL));
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(dzp);
 	zilog = zfsvfs->z_log;
 
 	if (dzp->z_pflags & ZFS_XATTR) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EINVAL));
 	}
 
 	if (zfsvfs->z_utf8 && u8_validate(dirname,
 	    strlen(dirname), NULL, U8_VALIDATE_ENTIRE, &error) < 0) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EILSEQ));
 	}
-	if (flags & FIGNORECASE)
-		zf |= ZCILOOK;
 
 	if (vap->va_mask & AT_XVATTR) {
 		if ((error = secpolicy_xvattr(dvp, (xvattr_t *)vap,
 		    crgetuid(cr), cr, vap->va_type)) != 0) {
 			ZFS_EXIT(zfsvfs);
 			return (error);
 		}
 	}
 
 	if ((error = zfs_acl_ids_create(dzp, 0, vap, cr,
-	    vsecp, &acl_ids)) != 0) {
+	    NULL, &acl_ids)) != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
-	getnewvnode_reserve(1);
-
 	/*
 	 * First make sure the new directory doesn't exist.
 	 *
 	 * Existence is checked first to make sure we don't return
 	 * EACCES instead of EEXIST which can cause some applications
 	 * to fail.
 	 */
-top:
 	*vpp = NULL;
 
-	if (error = zfs_dirent_lock(&dl, dzp, dirname, &zp, zf,
-	    NULL, NULL)) {
+	if (error = zfs_dirent_lookup(dzp, dirname, &zp, ZNEW)) {
 		zfs_acl_ids_free(&acl_ids);
-		getnewvnode_drop_reserve();
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
+	ASSERT3P(zp, ==, NULL);
 
 	if (error = zfs_zaccess(dzp, ACE_ADD_SUBDIRECTORY, 0, B_FALSE, cr)) {
 		zfs_acl_ids_free(&acl_ids);
-		zfs_dirent_unlock(dl);
-		getnewvnode_drop_reserve();
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	if (zfs_acl_ids_overquota(zfsvfs, &acl_ids)) {
 		zfs_acl_ids_free(&acl_ids);
-		zfs_dirent_unlock(dl);
-		getnewvnode_drop_reserve();
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EDQUOT));
 	}
 
 	/*
 	 * Add a new entry to the directory.
 	 */
+	getnewvnode_reserve(1);
 	tx = dmu_tx_create(zfsvfs->z_os);
 	dmu_tx_hold_zap(tx, dzp->z_id, TRUE, dirname);
 	dmu_tx_hold_zap(tx, DMU_NEW_OBJECT, FALSE, NULL);
 	fuid_dirtied = zfsvfs->z_fuid_dirty;
 	if (fuid_dirtied)
 		zfs_fuid_txhold(zfsvfs, tx);
 	if (!zfsvfs->z_use_sa && acl_ids.z_aclp->z_acl_bytes > ZFS_ACE_SPACE) {
 		dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0,
 		    acl_ids.z_aclp->z_acl_bytes);
 	}
 
 	dmu_tx_hold_sa_create(tx, acl_ids.z_aclp->z_acl_bytes +
 	    ZFS_SA_BASE_ATTR_SIZE);
 
-	error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT);
+	error = dmu_tx_assign(tx, TXG_WAIT);
 	if (error) {
-		zfs_dirent_unlock(dl);
-		if (error == ERESTART) {
-			waited = B_TRUE;
-			dmu_tx_wait(tx);
-			dmu_tx_abort(tx);
-			goto top;
-		}
 		zfs_acl_ids_free(&acl_ids);
 		dmu_tx_abort(tx);
 		getnewvnode_drop_reserve();
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	/*
 	 * Create new node.
 	 */
 	zfs_mknode(dzp, vap, tx, cr, 0, &zp, &acl_ids);
 
 	if (fuid_dirtied)
 		zfs_fuid_sync(zfsvfs, tx);
 
 	/*
 	 * Now put new name in parent dir.
 	 */
-	(void) zfs_link_create(dl, zp, tx, ZNEW);
+	(void) zfs_link_create(dzp, dirname, zp, tx, ZNEW);
 
 	*vpp = ZTOV(zp);
 
-	txtype = zfs_log_create_txtype(Z_DIR, vsecp, vap);
-	if (flags & FIGNORECASE)
-		txtype |= TX_CI;
-	zfs_log_create(zilog, tx, txtype, dzp, zp, dirname, vsecp,
+	txtype = zfs_log_create_txtype(Z_DIR, NULL, vap);
+	zfs_log_create(zilog, tx, txtype, dzp, zp, dirname, NULL,
 	    acl_ids.z_fuidp, vap);
 
 	zfs_acl_ids_free(&acl_ids);
 
 	dmu_tx_commit(tx);
 
 	getnewvnode_drop_reserve();
 
-	zfs_dirent_unlock(dl);
-
 	if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)
 		zil_commit(zilog, 0);
 
 	ZFS_EXIT(zfsvfs);
 	return (0);
 }
 
 /*
  * Remove a directory subdir entry.  If the current working
  * directory is the same as the subdir to be removed, the
  * remove will fail.
  *
  *	IN:	dvp	- vnode of directory to remove from.
  *		name	- name of directory to be removed.
  *		cwd	- vnode of current working directory.
  *		cr	- credentials of caller.
  *		ct	- caller context
  *		flags	- case flags
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Timestamps:
  *	dvp - ctime|mtime updated
  */
 /*ARGSUSED*/
 static int
-zfs_rmdir(vnode_t *dvp, char *name, vnode_t *cwd, cred_t *cr,
-    caller_context_t *ct, int flags)
+zfs_rmdir(vnode_t *dvp, vnode_t *vp, char *name, cred_t *cr)
 {
 	znode_t		*dzp = VTOZ(dvp);
-	znode_t		*zp;
-	vnode_t		*vp;
+	znode_t		*zp = VTOZ(vp);
 	zfsvfs_t	*zfsvfs = dzp->z_zfsvfs;
 	zilog_t		*zilog;
-	zfs_dirlock_t	*dl;
 	dmu_tx_t	*tx;
 	int		error;
-	int		zflg = ZEXISTS;
-	boolean_t	waited = B_FALSE;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(dzp);
+	ZFS_VERIFY_ZP(zp);
 	zilog = zfsvfs->z_log;
 
-	if (flags & FIGNORECASE)
-		zflg |= ZCILOOK;
-top:
-	zp = NULL;
 
-	/*
-	 * Attempt to lock directory; fail if entry doesn't exist.
-	 */
-	if (error = zfs_dirent_lock(&dl, dzp, name, &zp, zflg,
-	    NULL, NULL)) {
-		ZFS_EXIT(zfsvfs);
-		return (error);
-	}
-
-	vp = ZTOV(zp);
-
 	if (error = zfs_zaccess_delete(dzp, zp, cr)) {
 		goto out;
 	}
 
 	if (vp->v_type != VDIR) {
 		error = SET_ERROR(ENOTDIR);
 		goto out;
 	}
 
-	if (vp == cwd) {
-		error = SET_ERROR(EINVAL);
-		goto out;
-	}
-
 	vnevent_rmdir(vp, dvp, name, ct);
 
-	/*
-	 * Grab a lock on the directory to make sure that noone is
-	 * trying to add (or lookup) entries while we are removing it.
-	 */
-	rw_enter(&zp->z_name_lock, RW_WRITER);
-
-	/*
-	 * Grab a lock on the parent pointer to make sure we play well
-	 * with the treewalk and directory rename code.
-	 */
-	rw_enter(&zp->z_parent_lock, RW_WRITER);
-
 	tx = dmu_tx_create(zfsvfs->z_os);
 	dmu_tx_hold_zap(tx, dzp->z_id, FALSE, name);
 	dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
 	dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, FALSE, NULL);
 	zfs_sa_upgrade_txholds(tx, zp);
 	zfs_sa_upgrade_txholds(tx, dzp);
 	dmu_tx_mark_netfree(tx);
-	error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT);
+	error = dmu_tx_assign(tx, TXG_WAIT);
 	if (error) {
-		rw_exit(&zp->z_parent_lock);
-		rw_exit(&zp->z_name_lock);
-		zfs_dirent_unlock(dl);
-		VN_RELE(vp);
-		if (error == ERESTART) {
-			waited = B_TRUE;
-			dmu_tx_wait(tx);
-			dmu_tx_abort(tx);
-			goto top;
-		}
 		dmu_tx_abort(tx);
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
-#ifdef FREEBSD_NAMECACHE
 	cache_purge(dvp);
-#endif
 
-	error = zfs_link_destroy(dl, zp, tx, zflg, NULL);
+	error = zfs_link_destroy(dzp, name, zp, tx, ZEXISTS, NULL);
 
 	if (error == 0) {
 		uint64_t txtype = TX_RMDIR;
-		if (flags & FIGNORECASE)
-			txtype |= TX_CI;
 		zfs_log_remove(zilog, tx, txtype, dzp, name, ZFS_NO_OBJECT);
 	}
 
 	dmu_tx_commit(tx);
 
-	rw_exit(&zp->z_parent_lock);
-	rw_exit(&zp->z_name_lock);
-#ifdef FREEBSD_NAMECACHE
 	cache_purge(vp);
-#endif
 out:
-	zfs_dirent_unlock(dl);
-
-	VN_RELE(vp);
-
 	if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)
 		zil_commit(zilog, 0);
 
 	ZFS_EXIT(zfsvfs);
 	return (error);
 }
 
 /*
  * Read as many directory entries as will fit into the provided
  * buffer from the given directory cursor position (specified in
  * the uio structure).
  *
  *	IN:	vp	- vnode of directory to read.
  *		uio	- structure supplying read location, range info,
  *			  and return buffer.
  *		cr	- credentials of caller.
  *		ct	- caller context
  *		flags	- case flags
  *
  *	OUT:	uio	- updated offset and range, buffer filled.
  *		eofp	- set to true if end-of-file detected.
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Timestamps:
  *	vp - atime updated
  *
  * Note that the low 4 bits of the cookie returned by zap is always zero.
  * This allows us to use the low range for "special" directory entries:
  * We use 0 for '.', and 1 for '..'.  If this is the root of the filesystem,
  * we use the offset 2 for the '.zfs' directory.
  */
 /* ARGSUSED */
 static int
 zfs_readdir(vnode_t *vp, uio_t *uio, cred_t *cr, int *eofp, int *ncookies, u_long **cookies)
 {
 	znode_t		*zp = VTOZ(vp);
 	iovec_t		*iovp;
 	edirent_t	*eodp;
 	dirent64_t	*odp;
 	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
 	objset_t	*os;
 	caddr_t		outbuf;
 	size_t		bufsize;
 	zap_cursor_t	zc;
 	zap_attribute_t	zap;
 	uint_t		bytes_wanted;
 	uint64_t	offset; /* must be unsigned; checks for < 1 */
 	uint64_t	parent;
 	int		local_eof;
 	int		outcount;
 	int		error;
 	uint8_t		prefetch;
 	boolean_t	check_sysattrs;
 	uint8_t		type;
 	int		ncooks;
 	u_long		*cooks = NULL;
 	int		flags = 0;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_PARENT(zfsvfs),
 	    &parent, sizeof (parent))) != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	/*
 	 * If we are not given an eof variable,
 	 * use a local one.
 	 */
 	if (eofp == NULL)
 		eofp = &local_eof;
 
 	/*
 	 * Check for valid iov_len.
 	 */
 	if (uio->uio_iov->iov_len <= 0) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EINVAL));
 	}
 
 	/*
 	 * Quit if directory has been removed (posix)
 	 */
 	if ((*eofp = zp->z_unlinked) != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (0);
 	}
 
 	error = 0;
 	os = zfsvfs->z_os;
 	offset = uio->uio_loffset;
 	prefetch = zp->z_zn_prefetch;
 
 	/*
 	 * Initialize the iterator cursor.
 	 */
 	if (offset <= 3) {
 		/*
 		 * Start iteration from the beginning of the directory.
 		 */
 		zap_cursor_init(&zc, os, zp->z_id);
 	} else {
 		/*
 		 * The offset is a serialized cursor.
 		 */
 		zap_cursor_init_serialized(&zc, os, zp->z_id, offset);
 	}
 
 	/*
 	 * Get space to change directory entries into fs independent format.
 	 */
 	iovp = uio->uio_iov;
 	bytes_wanted = iovp->iov_len;
 	if (uio->uio_segflg != UIO_SYSSPACE || uio->uio_iovcnt != 1) {
 		bufsize = bytes_wanted;
 		outbuf = kmem_alloc(bufsize, KM_SLEEP);
 		odp = (struct dirent64 *)outbuf;
 	} else {
 		bufsize = bytes_wanted;
 		outbuf = NULL;
 		odp = (struct dirent64 *)iovp->iov_base;
 	}
 	eodp = (struct edirent *)odp;
 
 	if (ncookies != NULL) {
 		/*
 		 * Minimum entry size is dirent size and 1 byte for a file name.
 		 */
 		ncooks = uio->uio_resid / (sizeof(struct dirent) - sizeof(((struct dirent *)NULL)->d_name) + 1);
 		cooks = malloc(ncooks * sizeof(u_long), M_TEMP, M_WAITOK);
 		*cookies = cooks;
 		*ncookies = ncooks;
 	}
 	/*
 	 * If this VFS supports the system attribute view interface; and
 	 * we're looking at an extended attribute directory; and we care
 	 * about normalization conflicts on this vfs; then we must check
 	 * for normalization conflicts with the sysattr name space.
 	 */
 #ifdef TODO
 	check_sysattrs = vfs_has_feature(vp->v_vfsp, VFSFT_SYSATTR_VIEWS) &&
 	    (vp->v_flag & V_XATTRDIR) && zfsvfs->z_norm &&
 	    (flags & V_RDDIR_ENTFLAGS);
 #else
 	check_sysattrs = 0;
 #endif
 
 	/*
 	 * Transform to file-system independent format
 	 */
 	outcount = 0;
 	while (outcount < bytes_wanted) {
 		ino64_t objnum;
 		ushort_t reclen;
 		off64_t *next = NULL;
 
 		/*
 		 * Special case `.', `..', and `.zfs'.
 		 */
 		if (offset == 0) {
 			(void) strcpy(zap.za_name, ".");
 			zap.za_normalization_conflict = 0;
 			objnum = zp->z_id;
 			type = DT_DIR;
 		} else if (offset == 1) {
 			(void) strcpy(zap.za_name, "..");
 			zap.za_normalization_conflict = 0;
 			objnum = parent;
 			type = DT_DIR;
 		} else if (offset == 2 && zfs_show_ctldir(zp)) {
 			(void) strcpy(zap.za_name, ZFS_CTLDIR_NAME);
 			zap.za_normalization_conflict = 0;
 			objnum = ZFSCTL_INO_ROOT;
 			type = DT_DIR;
 		} else {
 			/*
 			 * Grab next entry.
 			 */
 			if (error = zap_cursor_retrieve(&zc, &zap)) {
 				if ((*eofp = (error == ENOENT)) != 0)
 					break;
 				else
 					goto update;
 			}
 
 			if (zap.za_integer_length != 8 ||
 			    zap.za_num_integers != 1) {
 				cmn_err(CE_WARN, "zap_readdir: bad directory "
 				    "entry, obj = %lld, offset = %lld\n",
 				    (u_longlong_t)zp->z_id,
 				    (u_longlong_t)offset);
 				error = SET_ERROR(ENXIO);
 				goto update;
 			}
 
 			objnum = ZFS_DIRENT_OBJ(zap.za_first_integer);
 			/*
 			 * MacOS X can extract the object type here such as:
 			 * uint8_t type = ZFS_DIRENT_TYPE(zap.za_first_integer);
 			 */
 			type = ZFS_DIRENT_TYPE(zap.za_first_integer);
 
 			if (check_sysattrs && !zap.za_normalization_conflict) {
 #ifdef TODO
 				zap.za_normalization_conflict =
 				    xattr_sysattr_casechk(zap.za_name);
 #else
 				panic("%s:%u: TODO", __func__, __LINE__);
 #endif
 			}
 		}
 
 		if (flags & V_RDDIR_ACCFILTER) {
 			/*
 			 * If we have no access at all, don't include
 			 * this entry in the returned information
 			 */
 			znode_t	*ezp;
 			if (zfs_zget(zp->z_zfsvfs, objnum, &ezp) != 0)
 				goto skip_entry;
 			if (!zfs_has_access(ezp, cr)) {
-				VN_RELE(ZTOV(ezp));
+				vrele(ZTOV(ezp));
 				goto skip_entry;
 			}
-			VN_RELE(ZTOV(ezp));
+			vrele(ZTOV(ezp));
 		}
 
 		if (flags & V_RDDIR_ENTFLAGS)
 			reclen = EDIRENT_RECLEN(strlen(zap.za_name));
 		else
 			reclen = DIRENT64_RECLEN(strlen(zap.za_name));
 
 		/*
 		 * Will this entry fit in the buffer?
 		 */
 		if (outcount + reclen > bufsize) {
 			/*
 			 * Did we manage to fit anything in the buffer?
 			 */
 			if (!outcount) {
 				error = SET_ERROR(EINVAL);
 				goto update;
 			}
 			break;
 		}
 		if (flags & V_RDDIR_ENTFLAGS) {
 			/*
 			 * Add extended flag entry:
 			 */
 			eodp->ed_ino = objnum;
 			eodp->ed_reclen = reclen;
 			/* NOTE: ed_off is the offset for the *next* entry */
 			next = &(eodp->ed_off);
 			eodp->ed_eflags = zap.za_normalization_conflict ?
 			    ED_CASE_CONFLICT : 0;
 			(void) strncpy(eodp->ed_name, zap.za_name,
 			    EDIRENT_NAMELEN(reclen));
 			eodp = (edirent_t *)((intptr_t)eodp + reclen);
 		} else {
 			/*
 			 * Add normal entry:
 			 */
 			odp->d_ino = objnum;
 			odp->d_reclen = reclen;
 			odp->d_namlen = strlen(zap.za_name);
 			(void) strlcpy(odp->d_name, zap.za_name, odp->d_namlen + 1);
 			odp->d_type = type;
 			odp = (dirent64_t *)((intptr_t)odp + reclen);
 		}
 		outcount += reclen;
 
 		ASSERT(outcount <= bufsize);
 
 		/* Prefetch znode */
 		if (prefetch)
 			dmu_prefetch(os, objnum, 0, 0, 0,
 			    ZIO_PRIORITY_SYNC_READ);
 
 	skip_entry:
 		/*
 		 * Move to the next entry, fill in the previous offset.
 		 */
 		if (offset > 2 || (offset == 2 && !zfs_show_ctldir(zp))) {
 			zap_cursor_advance(&zc);
 			offset = zap_cursor_serialize(&zc);
 		} else {
 			offset += 1;
 		}
 
 		if (cooks != NULL) {
 			*cooks++ = offset;
 			ncooks--;
 			KASSERT(ncooks >= 0, ("ncookies=%d", ncooks));
 		}
 	}
 	zp->z_zn_prefetch = B_FALSE; /* a lookup will re-enable pre-fetching */
 
 	/* Subtract unused cookies */
 	if (ncookies != NULL)
 		*ncookies -= ncooks;
 
 	if (uio->uio_segflg == UIO_SYSSPACE && uio->uio_iovcnt == 1) {
 		iovp->iov_base += outcount;
 		iovp->iov_len -= outcount;
 		uio->uio_resid -= outcount;
 	} else if (error = uiomove(outbuf, (long)outcount, UIO_READ, uio)) {
 		/*
 		 * Reset the pointer.
 		 */
 		offset = uio->uio_loffset;
 	}
 
 update:
 	zap_cursor_fini(&zc);
 	if (uio->uio_segflg != UIO_SYSSPACE || uio->uio_iovcnt != 1)
 		kmem_free(outbuf, bufsize);
 
 	if (error == ENOENT)
 		error = 0;
 
 	ZFS_ACCESSTIME_STAMP(zfsvfs, zp);
 
 	uio->uio_loffset = offset;
 	ZFS_EXIT(zfsvfs);
 	if (error != 0 && cookies != NULL) {
 		free(*cookies, M_TEMP);
 		*cookies = NULL;
 		*ncookies = 0;
 	}
 	return (error);
 }
 
 ulong_t zfs_fsync_sync_cnt = 4;
 
 static int
 zfs_fsync(vnode_t *vp, int syncflag, cred_t *cr, caller_context_t *ct)
 {
 	znode_t	*zp = VTOZ(vp);
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 
 	(void) tsd_set(zfs_fsyncer_key, (void *)zfs_fsync_sync_cnt);
 
 	if (zfsvfs->z_os->os_sync != ZFS_SYNC_DISABLED) {
 		ZFS_ENTER(zfsvfs);
 		ZFS_VERIFY_ZP(zp);
 		zil_commit(zfsvfs->z_log, zp->z_id);
 		ZFS_EXIT(zfsvfs);
 	}
 	return (0);
 }
 
 
 /*
  * Get the requested file attributes and place them in the provided
  * vattr structure.
  *
  *	IN:	vp	- vnode of file.
  *		vap	- va_mask identifies requested attributes.
  *			  If AT_XVATTR set, then optional attrs are requested
  *		flags	- ATTR_NOACLCHECK (CIFS server context)
  *		cr	- credentials of caller.
  *		ct	- caller context
  *
  *	OUT:	vap	- attribute values.
  *
  *	RETURN:	0 (always succeeds).
  */
 /* ARGSUSED */
 static int
 zfs_getattr(vnode_t *vp, vattr_t *vap, int flags, cred_t *cr,
     caller_context_t *ct)
 {
 	znode_t *zp = VTOZ(vp);
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	int	error = 0;
 	uint32_t blksize;
 	u_longlong_t nblocks;
 	uint64_t links;
 	uint64_t mtime[2], ctime[2], crtime[2], rdev;
 	xvattr_t *xvap = (xvattr_t *)vap;	/* vap may be an xvattr_t * */
 	xoptattr_t *xoap = NULL;
 	boolean_t skipaclchk = (flags & ATTR_NOACLCHECK) ? B_TRUE : B_FALSE;
 	sa_bulk_attr_t bulk[4];
 	int count = 0;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	zfs_fuid_map_ids(zp, cr, &vap->va_uid, &vap->va_gid);
 
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL, &mtime, 16);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, &ctime, 16);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CRTIME(zfsvfs), NULL, &crtime, 16);
 	if (vp->v_type == VBLK || vp->v_type == VCHR)
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_RDEV(zfsvfs), NULL,
 		    &rdev, 8);
 
 	if ((error = sa_bulk_lookup(zp->z_sa_hdl, bulk, count)) != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	/*
 	 * If ACL is trivial don't bother looking for ACE_READ_ATTRIBUTES.
 	 * Also, if we are the owner don't bother, since owner should
 	 * always be allowed to read basic attributes of file.
 	 */
 	if (!(zp->z_pflags & ZFS_ACL_TRIVIAL) &&
 	    (vap->va_uid != crgetuid(cr))) {
 		if (error = zfs_zaccess(zp, ACE_READ_ATTRIBUTES, 0,
 		    skipaclchk, cr)) {
 			ZFS_EXIT(zfsvfs);
 			return (error);
 		}
 	}
 
 	/*
 	 * Return all attributes.  It's cheaper to provide the answer
 	 * than to determine whether we were asked the question.
 	 */
 
-	mutex_enter(&zp->z_lock);
 	vap->va_type = IFTOVT(zp->z_mode);
 	vap->va_mode = zp->z_mode & ~S_IFMT;
 #ifdef illumos
 	vap->va_fsid = zp->z_zfsvfs->z_vfs->vfs_dev;
 #else
 	vap->va_fsid = vp->v_mount->mnt_stat.f_fsid.val[0];
 #endif
 	vap->va_nodeid = zp->z_id;
 	if ((vp->v_flag & VROOT) && zfs_show_ctldir(zp))
 		links = zp->z_links + 1;
 	else
 		links = zp->z_links;
 	vap->va_nlink = MIN(links, LINK_MAX);	/* nlink_t limit! */
 	vap->va_size = zp->z_size;
 #ifdef illumos
 	vap->va_rdev = vp->v_rdev;
 #else
 	if (vp->v_type == VBLK || vp->v_type == VCHR)
 		vap->va_rdev = zfs_cmpldev(rdev);
 #endif
 	vap->va_seq = zp->z_seq;
 	vap->va_flags = 0;	/* FreeBSD: Reset chflags(2) flags. */
      	vap->va_filerev = zp->z_seq;
 
 	/*
 	 * Add in any requested optional attributes and the create time.
 	 * Also set the corresponding bits in the returned attribute bitmap.
 	 */
 	if ((xoap = xva_getxoptattr(xvap)) != NULL && zfsvfs->z_use_fuids) {
 		if (XVA_ISSET_REQ(xvap, XAT_ARCHIVE)) {
 			xoap->xoa_archive =
 			    ((zp->z_pflags & ZFS_ARCHIVE) != 0);
 			XVA_SET_RTN(xvap, XAT_ARCHIVE);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_READONLY)) {
 			xoap->xoa_readonly =
 			    ((zp->z_pflags & ZFS_READONLY) != 0);
 			XVA_SET_RTN(xvap, XAT_READONLY);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_SYSTEM)) {
 			xoap->xoa_system =
 			    ((zp->z_pflags & ZFS_SYSTEM) != 0);
 			XVA_SET_RTN(xvap, XAT_SYSTEM);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_HIDDEN)) {
 			xoap->xoa_hidden =
 			    ((zp->z_pflags & ZFS_HIDDEN) != 0);
 			XVA_SET_RTN(xvap, XAT_HIDDEN);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_NOUNLINK)) {
 			xoap->xoa_nounlink =
 			    ((zp->z_pflags & ZFS_NOUNLINK) != 0);
 			XVA_SET_RTN(xvap, XAT_NOUNLINK);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_IMMUTABLE)) {
 			xoap->xoa_immutable =
 			    ((zp->z_pflags & ZFS_IMMUTABLE) != 0);
 			XVA_SET_RTN(xvap, XAT_IMMUTABLE);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_APPENDONLY)) {
 			xoap->xoa_appendonly =
 			    ((zp->z_pflags & ZFS_APPENDONLY) != 0);
 			XVA_SET_RTN(xvap, XAT_APPENDONLY);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_NODUMP)) {
 			xoap->xoa_nodump =
 			    ((zp->z_pflags & ZFS_NODUMP) != 0);
 			XVA_SET_RTN(xvap, XAT_NODUMP);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_OPAQUE)) {
 			xoap->xoa_opaque =
 			    ((zp->z_pflags & ZFS_OPAQUE) != 0);
 			XVA_SET_RTN(xvap, XAT_OPAQUE);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_AV_QUARANTINED)) {
 			xoap->xoa_av_quarantined =
 			    ((zp->z_pflags & ZFS_AV_QUARANTINED) != 0);
 			XVA_SET_RTN(xvap, XAT_AV_QUARANTINED);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_AV_MODIFIED)) {
 			xoap->xoa_av_modified =
 			    ((zp->z_pflags & ZFS_AV_MODIFIED) != 0);
 			XVA_SET_RTN(xvap, XAT_AV_MODIFIED);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_AV_SCANSTAMP) &&
 		    vp->v_type == VREG) {
 			zfs_sa_get_scanstamp(zp, xvap);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_CREATETIME)) {
 			uint64_t times[2];
 
 			(void) sa_lookup(zp->z_sa_hdl, SA_ZPL_CRTIME(zfsvfs),
 			    times, sizeof (times));
 			ZFS_TIME_DECODE(&xoap->xoa_createtime, times);
 			XVA_SET_RTN(xvap, XAT_CREATETIME);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_REPARSE)) {
 			xoap->xoa_reparse = ((zp->z_pflags & ZFS_REPARSE) != 0);
 			XVA_SET_RTN(xvap, XAT_REPARSE);
 		}
 		if (XVA_ISSET_REQ(xvap, XAT_GEN)) {
 			xoap->xoa_generation = zp->z_gen;
 			XVA_SET_RTN(xvap, XAT_GEN);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_OFFLINE)) {
 			xoap->xoa_offline =
 			    ((zp->z_pflags & ZFS_OFFLINE) != 0);
 			XVA_SET_RTN(xvap, XAT_OFFLINE);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_SPARSE)) {
 			xoap->xoa_sparse =
 			    ((zp->z_pflags & ZFS_SPARSE) != 0);
 			XVA_SET_RTN(xvap, XAT_SPARSE);
 		}
 	}
 
 	ZFS_TIME_DECODE(&vap->va_atime, zp->z_atime);
 	ZFS_TIME_DECODE(&vap->va_mtime, mtime);
 	ZFS_TIME_DECODE(&vap->va_ctime, ctime);
 	ZFS_TIME_DECODE(&vap->va_birthtime, crtime);
 
-	mutex_exit(&zp->z_lock);
 
 	sa_object_size(zp->z_sa_hdl, &blksize, &nblocks);
 	vap->va_blksize = blksize;
 	vap->va_bytes = nblocks << 9;	/* nblocks * 512 */
 
 	if (zp->z_blksz == 0) {
 		/*
 		 * Block size hasn't been set; suggest maximal I/O transfers.
 		 */
 		vap->va_blksize = zfsvfs->z_max_blksz;
 	}
 
 	ZFS_EXIT(zfsvfs);
 	return (0);
 }
 
 /*
  * Set the file attributes to the values contained in the
  * vattr structure.
  *
  *	IN:	vp	- vnode of file to be modified.
  *		vap	- new attribute values.
  *			  If AT_XVATTR set, then optional attrs are being set
  *		flags	- ATTR_UTIME set if non-default time values provided.
  *			- ATTR_NOACLCHECK (CIFS context only).
  *		cr	- credentials of caller.
  *		ct	- caller context
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Timestamps:
  *	vp - ctime updated, mtime updated if size changed.
  */
 /* ARGSUSED */
 static int
 zfs_setattr(vnode_t *vp, vattr_t *vap, int flags, cred_t *cr,
     caller_context_t *ct)
 {
 	znode_t		*zp = VTOZ(vp);
 	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
 	zilog_t		*zilog;
 	dmu_tx_t	*tx;
 	vattr_t		oldva;
 	xvattr_t	tmpxvattr;
 	uint_t		mask = vap->va_mask;
 	uint_t		saved_mask = 0;
 	uint64_t	saved_mode;
 	int		trim_mask = 0;
 	uint64_t	new_mode;
 	uint64_t	new_uid, new_gid;
 	uint64_t	xattr_obj;
 	uint64_t	mtime[2], ctime[2];
 	znode_t		*attrzp;
 	int		need_policy = FALSE;
 	int		err, err2;
 	zfs_fuid_info_t *fuidp = NULL;
 	xvattr_t *xvap = (xvattr_t *)vap;	/* vap may be an xvattr_t * */
 	xoptattr_t	*xoap;
 	zfs_acl_t	*aclp;
 	boolean_t skipaclchk = (flags & ATTR_NOACLCHECK) ? B_TRUE : B_FALSE;
 	boolean_t	fuid_dirtied = B_FALSE;
 	sa_bulk_attr_t	bulk[7], xattr_bulk[7];
 	int		count = 0, xattr_count = 0;
 
 	if (mask == 0)
 		return (0);
 
 	if (mask & AT_NOSET)
 		return (SET_ERROR(EINVAL));
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	zilog = zfsvfs->z_log;
 
 	/*
 	 * Make sure that if we have ephemeral uid/gid or xvattr specified
 	 * that file system is at proper version level
 	 */
 
 	if (zfsvfs->z_use_fuids == B_FALSE &&
 	    (((mask & AT_UID) && IS_EPHEMERAL(vap->va_uid)) ||
 	    ((mask & AT_GID) && IS_EPHEMERAL(vap->va_gid)) ||
 	    (mask & AT_XVATTR))) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EINVAL));
 	}
 
 	if (mask & AT_SIZE && vp->v_type == VDIR) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EISDIR));
 	}
 
 	if (mask & AT_SIZE && vp->v_type != VREG && vp->v_type != VFIFO) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EINVAL));
 	}
 
 	/*
 	 * If this is an xvattr_t, then get a pointer to the structure of
 	 * optional attributes.  If this is NULL, then we have a vattr_t.
 	 */
 	xoap = xva_getxoptattr(xvap);
 
 	xva_init(&tmpxvattr);
 
 	/*
 	 * Immutable files can only alter immutable bit and atime
 	 */
 	if ((zp->z_pflags & ZFS_IMMUTABLE) &&
 	    ((mask & (AT_SIZE|AT_UID|AT_GID|AT_MTIME|AT_MODE)) ||
 	    ((mask & AT_XVATTR) && XVA_ISSET_REQ(xvap, XAT_CREATETIME)))) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EPERM));
 	}
 
 	if ((mask & AT_SIZE) && (zp->z_pflags & ZFS_READONLY)) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EPERM));
 	}
 
 	/*
 	 * Verify timestamps doesn't overflow 32 bits.
 	 * ZFS can handle large timestamps, but 32bit syscalls can't
 	 * handle times greater than 2039.  This check should be removed
 	 * once large timestamps are fully supported.
 	 */
 	if (mask & (AT_ATIME | AT_MTIME)) {
 		if (((mask & AT_ATIME) && TIMESPEC_OVERFLOW(&vap->va_atime)) ||
 		    ((mask & AT_MTIME) && TIMESPEC_OVERFLOW(&vap->va_mtime))) {
 			ZFS_EXIT(zfsvfs);
 			return (SET_ERROR(EOVERFLOW));
 		}
 	}
 
-top:
 	attrzp = NULL;
 	aclp = NULL;
 
 	/* Can this be moved to before the top label? */
 	if (zfsvfs->z_vfs->vfs_flag & VFS_RDONLY) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EROFS));
 	}
 
 	/*
 	 * First validate permissions
 	 */
 
 	if (mask & AT_SIZE) {
 		/*
 		 * XXX - Note, we are not providing any open
 		 * mode flags here (like FNDELAY), so we may
 		 * block if there are locks present... this
 		 * should be addressed in openat().
 		 */
 		/* XXX - would it be OK to generate a log record here? */
 		err = zfs_freesp(zp, vap->va_size, 0, 0, FALSE);
 		if (err) {
 			ZFS_EXIT(zfsvfs);
 			return (err);
 		}
 	}
 
 	if (mask & (AT_ATIME|AT_MTIME) ||
 	    ((mask & AT_XVATTR) && (XVA_ISSET_REQ(xvap, XAT_HIDDEN) ||
 	    XVA_ISSET_REQ(xvap, XAT_READONLY) ||
 	    XVA_ISSET_REQ(xvap, XAT_ARCHIVE) ||
 	    XVA_ISSET_REQ(xvap, XAT_OFFLINE) ||
 	    XVA_ISSET_REQ(xvap, XAT_SPARSE) ||
 	    XVA_ISSET_REQ(xvap, XAT_CREATETIME) ||
 	    XVA_ISSET_REQ(xvap, XAT_SYSTEM)))) {
 		need_policy = zfs_zaccess(zp, ACE_WRITE_ATTRIBUTES, 0,
 		    skipaclchk, cr);
 	}
 
 	if (mask & (AT_UID|AT_GID)) {
 		int	idmask = (mask & (AT_UID|AT_GID));
 		int	take_owner;
 		int	take_group;
 
 		/*
 		 * NOTE: even if a new mode is being set,
 		 * we may clear S_ISUID/S_ISGID bits.
 		 */
 
 		if (!(mask & AT_MODE))
 			vap->va_mode = zp->z_mode;
 
 		/*
 		 * Take ownership or chgrp to group we are a member of
 		 */
 
 		take_owner = (mask & AT_UID) && (vap->va_uid == crgetuid(cr));
 		take_group = (mask & AT_GID) &&
 		    zfs_groupmember(zfsvfs, vap->va_gid, cr);
 
 		/*
 		 * If both AT_UID and AT_GID are set then take_owner and
 		 * take_group must both be set in order to allow taking
 		 * ownership.
 		 *
 		 * Otherwise, send the check through secpolicy_vnode_setattr()
 		 *
 		 */
 
 		if (((idmask == (AT_UID|AT_GID)) && take_owner && take_group) ||
 		    ((idmask == AT_UID) && take_owner) ||
 		    ((idmask == AT_GID) && take_group)) {
 			if (zfs_zaccess(zp, ACE_WRITE_OWNER, 0,
 			    skipaclchk, cr) == 0) {
 				/*
 				 * Remove setuid/setgid for non-privileged users
 				 */
 				secpolicy_setid_clear(vap, vp, cr);
 				trim_mask = (mask & (AT_UID|AT_GID));
 			} else {
 				need_policy =  TRUE;
 			}
 		} else {
 			need_policy =  TRUE;
 		}
 	}
 
-	mutex_enter(&zp->z_lock);
 	oldva.va_mode = zp->z_mode;
 	zfs_fuid_map_ids(zp, cr, &oldva.va_uid, &oldva.va_gid);
 	if (mask & AT_XVATTR) {
 		/*
 		 * Update xvattr mask to include only those attributes
 		 * that are actually changing.
 		 *
 		 * the bits will be restored prior to actually setting
 		 * the attributes so the caller thinks they were set.
 		 */
 		if (XVA_ISSET_REQ(xvap, XAT_APPENDONLY)) {
 			if (xoap->xoa_appendonly !=
 			    ((zp->z_pflags & ZFS_APPENDONLY) != 0)) {
 				need_policy = TRUE;
 			} else {
 				XVA_CLR_REQ(xvap, XAT_APPENDONLY);
 				XVA_SET_REQ(&tmpxvattr, XAT_APPENDONLY);
 			}
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_NOUNLINK)) {
 			if (xoap->xoa_nounlink !=
 			    ((zp->z_pflags & ZFS_NOUNLINK) != 0)) {
 				need_policy = TRUE;
 			} else {
 				XVA_CLR_REQ(xvap, XAT_NOUNLINK);
 				XVA_SET_REQ(&tmpxvattr, XAT_NOUNLINK);
 			}
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_IMMUTABLE)) {
 			if (xoap->xoa_immutable !=
 			    ((zp->z_pflags & ZFS_IMMUTABLE) != 0)) {
 				need_policy = TRUE;
 			} else {
 				XVA_CLR_REQ(xvap, XAT_IMMUTABLE);
 				XVA_SET_REQ(&tmpxvattr, XAT_IMMUTABLE);
 			}
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_NODUMP)) {
 			if (xoap->xoa_nodump !=
 			    ((zp->z_pflags & ZFS_NODUMP) != 0)) {
 				need_policy = TRUE;
 			} else {
 				XVA_CLR_REQ(xvap, XAT_NODUMP);
 				XVA_SET_REQ(&tmpxvattr, XAT_NODUMP);
 			}
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_AV_MODIFIED)) {
 			if (xoap->xoa_av_modified !=
 			    ((zp->z_pflags & ZFS_AV_MODIFIED) != 0)) {
 				need_policy = TRUE;
 			} else {
 				XVA_CLR_REQ(xvap, XAT_AV_MODIFIED);
 				XVA_SET_REQ(&tmpxvattr, XAT_AV_MODIFIED);
 			}
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_AV_QUARANTINED)) {
 			if ((vp->v_type != VREG &&
 			    xoap->xoa_av_quarantined) ||
 			    xoap->xoa_av_quarantined !=
 			    ((zp->z_pflags & ZFS_AV_QUARANTINED) != 0)) {
 				need_policy = TRUE;
 			} else {
 				XVA_CLR_REQ(xvap, XAT_AV_QUARANTINED);
 				XVA_SET_REQ(&tmpxvattr, XAT_AV_QUARANTINED);
 			}
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_REPARSE)) {
-			mutex_exit(&zp->z_lock);
 			ZFS_EXIT(zfsvfs);
 			return (SET_ERROR(EPERM));
 		}
 
 		if (need_policy == FALSE &&
 		    (XVA_ISSET_REQ(xvap, XAT_AV_SCANSTAMP) ||
 		    XVA_ISSET_REQ(xvap, XAT_OPAQUE))) {
 			need_policy = TRUE;
 		}
 	}
 
-	mutex_exit(&zp->z_lock);
-
 	if (mask & AT_MODE) {
 		if (zfs_zaccess(zp, ACE_WRITE_ACL, 0, skipaclchk, cr) == 0) {
 			err = secpolicy_setid_setsticky_clear(vp, vap,
 			    &oldva, cr);
 			if (err) {
 				ZFS_EXIT(zfsvfs);
 				return (err);
 			}
 			trim_mask |= AT_MODE;
 		} else {
 			need_policy = TRUE;
 		}
 	}
 
 	if (need_policy) {
 		/*
 		 * If trim_mask is set then take ownership
 		 * has been granted or write_acl is present and user
 		 * has the ability to modify mode.  In that case remove
 		 * UID|GID and or MODE from mask so that
 		 * secpolicy_vnode_setattr() doesn't revoke it.
 		 */
 
 		if (trim_mask) {
 			saved_mask = vap->va_mask;
 			vap->va_mask &= ~trim_mask;
 			if (trim_mask & AT_MODE) {
 				/*
 				 * Save the mode, as secpolicy_vnode_setattr()
 				 * will overwrite it with ova.va_mode.
 				 */
 				saved_mode = vap->va_mode;
 			}
 		}
 		err = secpolicy_vnode_setattr(cr, vp, vap, &oldva, flags,
 		    (int (*)(void *, int, cred_t *))zfs_zaccess_unix, zp);
 		if (err) {
 			ZFS_EXIT(zfsvfs);
 			return (err);
 		}
 
 		if (trim_mask) {
 			vap->va_mask |= saved_mask;
 			if (trim_mask & AT_MODE) {
 				/*
 				 * Recover the mode after
 				 * secpolicy_vnode_setattr().
 				 */
 				vap->va_mode = saved_mode;
 			}
 		}
 	}
 
 	/*
 	 * secpolicy_vnode_setattr, or take ownership may have
 	 * changed va_mask
 	 */
 	mask = vap->va_mask;
 
 	if ((mask & (AT_UID | AT_GID))) {
 		err = sa_lookup(zp->z_sa_hdl, SA_ZPL_XATTR(zfsvfs),
 		    &xattr_obj, sizeof (xattr_obj));
 
 		if (err == 0 && xattr_obj) {
 			err = zfs_zget(zp->z_zfsvfs, xattr_obj, &attrzp);
 			if (err)
 				goto out2;
 		}
 		if (mask & AT_UID) {
 			new_uid = zfs_fuid_create(zfsvfs,
 			    (uint64_t)vap->va_uid, cr, ZFS_OWNER, &fuidp);
 			if (new_uid != zp->z_uid &&
 			    zfs_fuid_overquota(zfsvfs, B_FALSE, new_uid)) {
 				if (attrzp)
-					VN_RELE(ZTOV(attrzp));
+					vrele(ZTOV(attrzp));
 				err = SET_ERROR(EDQUOT);
 				goto out2;
 			}
 		}
 
 		if (mask & AT_GID) {
 			new_gid = zfs_fuid_create(zfsvfs, (uint64_t)vap->va_gid,
 			    cr, ZFS_GROUP, &fuidp);
 			if (new_gid != zp->z_gid &&
 			    zfs_fuid_overquota(zfsvfs, B_TRUE, new_gid)) {
 				if (attrzp)
-					VN_RELE(ZTOV(attrzp));
+					vrele(ZTOV(attrzp));
 				err = SET_ERROR(EDQUOT);
 				goto out2;
 			}
 		}
 	}
 	tx = dmu_tx_create(zfsvfs->z_os);
 
 	if (mask & AT_MODE) {
 		uint64_t pmode = zp->z_mode;
 		uint64_t acl_obj;
 		new_mode = (pmode & S_IFMT) | (vap->va_mode & ~S_IFMT);
 
 		if (zp->z_zfsvfs->z_acl_mode == ZFS_ACL_RESTRICTED &&
 		    !(zp->z_pflags & ZFS_ACL_TRIVIAL)) {
 			err = SET_ERROR(EPERM);
 			goto out;
 		}
 
 		if (err = zfs_acl_chmod_setattr(zp, &aclp, new_mode))
 			goto out;
 
-		mutex_enter(&zp->z_lock);
 		if (!zp->z_is_sa && ((acl_obj = zfs_external_acl(zp)) != 0)) {
 			/*
 			 * Are we upgrading ACL from old V0 format
 			 * to V1 format?
 			 */
 			if (zfsvfs->z_version >= ZPL_VERSION_FUID &&
 			    zfs_znode_acl_version(zp) ==
 			    ZFS_ACL_VERSION_INITIAL) {
 				dmu_tx_hold_free(tx, acl_obj, 0,
 				    DMU_OBJECT_END);
 				dmu_tx_hold_write(tx, DMU_NEW_OBJECT,
 				    0, aclp->z_acl_bytes);
 			} else {
 				dmu_tx_hold_write(tx, acl_obj, 0,
 				    aclp->z_acl_bytes);
 			}
 		} else if (!zp->z_is_sa && aclp->z_acl_bytes > ZFS_ACE_SPACE) {
 			dmu_tx_hold_write(tx, DMU_NEW_OBJECT,
 			    0, aclp->z_acl_bytes);
 		}
-		mutex_exit(&zp->z_lock);
 		dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE);
 	} else {
 		if ((mask & AT_XVATTR) &&
 		    XVA_ISSET_REQ(xvap, XAT_AV_SCANSTAMP))
 			dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_TRUE);
 		else
 			dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
 	}
 
 	if (attrzp) {
 		dmu_tx_hold_sa(tx, attrzp->z_sa_hdl, B_FALSE);
 	}
 
 	fuid_dirtied = zfsvfs->z_fuid_dirty;
 	if (fuid_dirtied)
 		zfs_fuid_txhold(zfsvfs, tx);
 
 	zfs_sa_upgrade_txholds(tx, zp);
 
 	err = dmu_tx_assign(tx, TXG_WAIT);
 	if (err)
 		goto out;
 
 	count = 0;
 	/*
 	 * Set each attribute requested.
 	 * We group settings according to the locks they need to acquire.
 	 *
 	 * Note: you cannot set ctime directly, although it will be
 	 * updated as a side-effect of calling this function.
 	 */
 
-
 	if (mask & (AT_UID|AT_GID|AT_MODE))
 		mutex_enter(&zp->z_acl_lock);
-	mutex_enter(&zp->z_lock);
 
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL,
 	    &zp->z_pflags, sizeof (zp->z_pflags));
 
 	if (attrzp) {
 		if (mask & (AT_UID|AT_GID|AT_MODE))
 			mutex_enter(&attrzp->z_acl_lock);
-		mutex_enter(&attrzp->z_lock);
 		SA_ADD_BULK_ATTR(xattr_bulk, xattr_count,
 		    SA_ZPL_FLAGS(zfsvfs), NULL, &attrzp->z_pflags,
 		    sizeof (attrzp->z_pflags));
 	}
 
 	if (mask & (AT_UID|AT_GID)) {
 
 		if (mask & AT_UID) {
 			SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_UID(zfsvfs), NULL,
 			    &new_uid, sizeof (new_uid));
 			zp->z_uid = new_uid;
 			if (attrzp) {
 				SA_ADD_BULK_ATTR(xattr_bulk, xattr_count,
 				    SA_ZPL_UID(zfsvfs), NULL, &new_uid,
 				    sizeof (new_uid));
 				attrzp->z_uid = new_uid;
 			}
 		}
 
 		if (mask & AT_GID) {
 			SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GID(zfsvfs),
 			    NULL, &new_gid, sizeof (new_gid));
 			zp->z_gid = new_gid;
 			if (attrzp) {
 				SA_ADD_BULK_ATTR(xattr_bulk, xattr_count,
 				    SA_ZPL_GID(zfsvfs), NULL, &new_gid,
 				    sizeof (new_gid));
 				attrzp->z_gid = new_gid;
 			}
 		}
 		if (!(mask & AT_MODE)) {
 			SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zfsvfs),
 			    NULL, &new_mode, sizeof (new_mode));
 			new_mode = zp->z_mode;
 		}
 		err = zfs_acl_chown_setattr(zp);
 		ASSERT(err == 0);
 		if (attrzp) {
 			err = zfs_acl_chown_setattr(attrzp);
 			ASSERT(err == 0);
 		}
 	}
 
 	if (mask & AT_MODE) {
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zfsvfs), NULL,
 		    &new_mode, sizeof (new_mode));
 		zp->z_mode = new_mode;
 		ASSERT3U((uintptr_t)aclp, !=, 0);
 		err = zfs_aclset_common(zp, aclp, cr, tx);
 		ASSERT0(err);
 		if (zp->z_acl_cached)
 			zfs_acl_free(zp->z_acl_cached);
 		zp->z_acl_cached = aclp;
 		aclp = NULL;
 	}
 
 
 	if (mask & AT_ATIME) {
 		ZFS_TIME_ENCODE(&vap->va_atime, zp->z_atime);
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ATIME(zfsvfs), NULL,
 		    &zp->z_atime, sizeof (zp->z_atime));
 	}
 
 	if (mask & AT_MTIME) {
 		ZFS_TIME_ENCODE(&vap->va_mtime, mtime);
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL,
 		    mtime, sizeof (mtime));
 	}
 
 	/* XXX - shouldn't this be done *before* the ATIME/MTIME checks? */
 	if (mask & AT_SIZE && !(mask & AT_MTIME)) {
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs),
 		    NULL, mtime, sizeof (mtime));
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL,
 		    &ctime, sizeof (ctime));
 		zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime,
 		    B_TRUE);
 	} else if (mask != 0) {
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL,
 		    &ctime, sizeof (ctime));
 		zfs_tstamp_update_setup(zp, STATE_CHANGED, mtime, ctime,
 		    B_TRUE);
 		if (attrzp) {
 			SA_ADD_BULK_ATTR(xattr_bulk, xattr_count,
 			    SA_ZPL_CTIME(zfsvfs), NULL,
 			    &ctime, sizeof (ctime));
 			zfs_tstamp_update_setup(attrzp, STATE_CHANGED,
 			    mtime, ctime, B_TRUE);
 		}
 	}
 	/*
 	 * Do this after setting timestamps to prevent timestamp
 	 * update from toggling bit
 	 */
 
 	if (xoap && (mask & AT_XVATTR)) {
 
 		/*
 		 * restore trimmed off masks
 		 * so that return masks can be set for caller.
 		 */
 
 		if (XVA_ISSET_REQ(&tmpxvattr, XAT_APPENDONLY)) {
 			XVA_SET_REQ(xvap, XAT_APPENDONLY);
 		}
 		if (XVA_ISSET_REQ(&tmpxvattr, XAT_NOUNLINK)) {
 			XVA_SET_REQ(xvap, XAT_NOUNLINK);
 		}
 		if (XVA_ISSET_REQ(&tmpxvattr, XAT_IMMUTABLE)) {
 			XVA_SET_REQ(xvap, XAT_IMMUTABLE);
 		}
 		if (XVA_ISSET_REQ(&tmpxvattr, XAT_NODUMP)) {
 			XVA_SET_REQ(xvap, XAT_NODUMP);
 		}
 		if (XVA_ISSET_REQ(&tmpxvattr, XAT_AV_MODIFIED)) {
 			XVA_SET_REQ(xvap, XAT_AV_MODIFIED);
 		}
 		if (XVA_ISSET_REQ(&tmpxvattr, XAT_AV_QUARANTINED)) {
 			XVA_SET_REQ(xvap, XAT_AV_QUARANTINED);
 		}
 
 		if (XVA_ISSET_REQ(xvap, XAT_AV_SCANSTAMP))
 			ASSERT(vp->v_type == VREG);
 
 		zfs_xvattr_set(zp, xvap, tx);
 	}
 
 	if (fuid_dirtied)
 		zfs_fuid_sync(zfsvfs, tx);
 
 	if (mask != 0)
 		zfs_log_setattr(zilog, tx, TX_SETATTR, zp, vap, mask, fuidp);
 
-	mutex_exit(&zp->z_lock);
 	if (mask & (AT_UID|AT_GID|AT_MODE))
 		mutex_exit(&zp->z_acl_lock);
 
 	if (attrzp) {
 		if (mask & (AT_UID|AT_GID|AT_MODE))
 			mutex_exit(&attrzp->z_acl_lock);
-		mutex_exit(&attrzp->z_lock);
 	}
 out:
 	if (err == 0 && attrzp) {
 		err2 = sa_bulk_update(attrzp->z_sa_hdl, xattr_bulk,
 		    xattr_count, tx);
 		ASSERT(err2 == 0);
 	}
 
 	if (attrzp)
-		VN_RELE(ZTOV(attrzp));
+		vrele(ZTOV(attrzp));
 
 	if (aclp)
 		zfs_acl_free(aclp);
 
 	if (fuidp) {
 		zfs_fuid_info_free(fuidp);
 		fuidp = NULL;
 	}
 
 	if (err) {
 		dmu_tx_abort(tx);
-		if (err == ERESTART)
-			goto top;
 	} else {
 		err2 = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx);
 		dmu_tx_commit(tx);
 	}
 
 out2:
 	if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)
 		zil_commit(zilog, 0);
 
 	ZFS_EXIT(zfsvfs);
 	return (err);
 }
 
-typedef struct zfs_zlock {
-	krwlock_t	*zl_rwlock;	/* lock we acquired */
-	znode_t		*zl_znode;	/* znode we held */
-	struct zfs_zlock *zl_next;	/* next in list */
-} zfs_zlock_t;
-
 /*
- * Drop locks and release vnodes that were held by zfs_rename_lock().
+ * We acquire all but fdvp locks using non-blocking acquisitions.  If we
+ * fail to acquire any lock in the path we will drop all held locks,
+ * acquire the new lock in a blocking fashion, and then release it and
+ * restart the rename.  This acquire/release step ensures that we do not
+ * spin on a lock waiting for release.  On error release all vnode locks
+ * and decrement references the way tmpfs_rename() would do.
  */
-static void
-zfs_rename_unlock(zfs_zlock_t **zlpp)
+static int
+zfs_rename_relock(struct vnode *sdvp, struct vnode **svpp,
+    struct vnode *tdvp, struct vnode **tvpp,
+    const struct componentname *scnp, const struct componentname *tcnp)
 {
-	zfs_zlock_t *zl;
+	zfsvfs_t	*zfsvfs;
+	struct vnode	*nvp, *svp, *tvp;
+	znode_t		*sdzp, *tdzp, *szp, *tzp;
+	const char	*snm = scnp->cn_nameptr;
+	const char	*tnm = tcnp->cn_nameptr;
+	int error;
 
-	while ((zl = *zlpp) != NULL) {
-		if (zl->zl_znode != NULL)
-			VN_RELE(ZTOV(zl->zl_znode));
-		rw_exit(zl->zl_rwlock);
-		*zlpp = zl->zl_next;
-		kmem_free(zl, sizeof (*zl));
+	VOP_UNLOCK(tdvp, 0);
+	if (*tvpp != NULL && *tvpp != tdvp)
+		VOP_UNLOCK(*tvpp, 0);
+
+relock:
+	error = vn_lock(sdvp, LK_EXCLUSIVE);
+	if (error)
+		goto out;
+	sdzp = VTOZ(sdvp);
+
+	error = vn_lock(tdvp, LK_EXCLUSIVE | LK_NOWAIT);
+	if (error != 0) {
+		VOP_UNLOCK(sdvp, 0);
+		if (error != EBUSY)
+			goto out;
+		error = vn_lock(tdvp, LK_EXCLUSIVE);
+		if (error)
+			goto out;
+		VOP_UNLOCK(tdvp, 0);
+		goto relock;
 	}
-}
+	tdzp = VTOZ(tdvp);
 
-/*
- * Search back through the directory tree, using the ".." entries.
- * Lock each directory in the chain to prevent concurrent renames.
- * Fail any attempt to move a directory into one of its own descendants.
- * XXX - z_parent_lock can overlap with map or grow locks
- */
-static int
-zfs_rename_lock(znode_t *szp, znode_t *tdzp, znode_t *sdzp, zfs_zlock_t **zlpp)
-{
-	zfs_zlock_t	*zl;
-	znode_t		*zp = tdzp;
-	uint64_t	rootid = zp->z_zfsvfs->z_root;
-	uint64_t	oidp = zp->z_id;
-	krwlock_t	*rwlp = &szp->z_parent_lock;
-	krw_t		rw = RW_WRITER;
+	/*
+	 * Before using sdzp and tdzp we must ensure that they are live.
+	 * As a porting legacy from illumos we have two things to worry
+	 * about.  One is typical for FreeBSD and it is that the vnode is
+	 * not reclaimed (doomed).  The other is that the znode is live.
+	 * The current code can invalidate the znode without acquiring the
+	 * corresponding vnode lock if the object represented by the znode
+	 * and vnode is no longer valid after a rollback or receive operation.
+	 * z_teardown_lock hidden behind ZFS_ENTER and ZFS_EXIT is the lock
+	 * that protects the znodes from the invalidation.
+	 */
+	zfsvfs = sdzp->z_zfsvfs;
+	ASSERT3P(zfsvfs, ==, tdzp->z_zfsvfs);
+	ZFS_ENTER(zfsvfs);
 
 	/*
-	 * First pass write-locks szp and compares to zp->z_id.
-	 * Later passes read-lock zp and compare to zp->z_parent.
+	 * We can not use ZFS_VERIFY_ZP() here because it could directly return
+	 * bypassing the cleanup code in the case of an error.
 	 */
-	do {
-		if (!rw_tryenter(rwlp, rw)) {
-			/*
-			 * Another thread is renaming in this path.
-			 * Note that if we are a WRITER, we don't have any
-			 * parent_locks held yet.
-			 */
-			if (rw == RW_READER && zp->z_id > szp->z_id) {
-				/*
-				 * Drop our locks and restart
-				 */
-				zfs_rename_unlock(&zl);
-				*zlpp = NULL;
-				zp = tdzp;
-				oidp = zp->z_id;
-				rwlp = &szp->z_parent_lock;
-				rw = RW_WRITER;
-				continue;
-			} else {
-				/*
-				 * Wait for other thread to drop its locks
-				 */
-				rw_enter(rwlp, rw);
+	if (tdzp->z_sa_hdl == NULL || sdzp->z_sa_hdl == NULL) {
+		ZFS_EXIT(zfsvfs);
+		VOP_UNLOCK(sdvp, 0);
+		VOP_UNLOCK(tdvp, 0);
+		error = SET_ERROR(EIO);
+		goto out;
+	}
+
+	/*
+	 * Re-resolve svp to be certain it still exists and fetch the
+	 * correct vnode.
+	 */
+	error = zfs_dirent_lookup(sdzp, snm, &szp, ZEXISTS);
+	if (error != 0) {
+		/* Source entry invalid or not there. */
+		ZFS_EXIT(zfsvfs);
+		VOP_UNLOCK(sdvp, 0);
+		VOP_UNLOCK(tdvp, 0);
+		if ((scnp->cn_flags & ISDOTDOT) != 0 ||
+		    (scnp->cn_namelen == 1 && scnp->cn_nameptr[0] == '.'))
+			error = SET_ERROR(EINVAL);
+		goto out;
+	}
+	svp = ZTOV(szp);
+
+	/*
+	 * Re-resolve tvp, if it disappeared we just carry on.
+	 */
+	error = zfs_dirent_lookup(tdzp, tnm, &tzp, 0);
+	if (error != 0) {
+		ZFS_EXIT(zfsvfs);
+		VOP_UNLOCK(sdvp, 0);
+		VOP_UNLOCK(tdvp, 0);
+		vrele(svp);
+		if ((tcnp->cn_flags & ISDOTDOT) != 0)
+			error = SET_ERROR(EINVAL);
+		goto out;
+	}
+	if (tzp != NULL)
+		tvp = ZTOV(tzp);
+	else
+		tvp = NULL;
+
+	/*
+	 * At present the vnode locks must be acquired before z_teardown_lock,
+	 * although it would be more logical to use the opposite order.
+	 */
+	ZFS_EXIT(zfsvfs);
+
+	/*
+	 * Now try acquire locks on svp and tvp.
+	 */
+	nvp = svp;
+	error = vn_lock(nvp, LK_EXCLUSIVE | LK_NOWAIT);
+	if (error != 0) {
+		VOP_UNLOCK(sdvp, 0);
+		VOP_UNLOCK(tdvp, 0);
+		if (tvp != NULL)
+			vrele(tvp);
+		if (error != EBUSY) {
+			vrele(nvp);
+			goto out;
+		}
+		error = vn_lock(nvp, LK_EXCLUSIVE);
+		if (error != 0) {
+			vrele(nvp);
+			goto out;
+		}
+		VOP_UNLOCK(nvp, 0);
+		/*
+		 * Concurrent rename race.
+		 * XXX ?
+		 */
+		if (nvp == tdvp) {
+			vrele(nvp);
+			error = SET_ERROR(EINVAL);
+			goto out;
+		}
+		vrele(*svpp);
+		*svpp = nvp;
+		goto relock;
+	}
+	vrele(*svpp);
+	*svpp = nvp;
+
+	if (*tvpp != NULL)
+		vrele(*tvpp);
+	*tvpp = NULL;
+	if (tvp != NULL) {
+		nvp = tvp;
+		error = vn_lock(nvp, LK_EXCLUSIVE | LK_NOWAIT);
+		if (error != 0) {
+			VOP_UNLOCK(sdvp, 0);
+			VOP_UNLOCK(tdvp, 0);
+			VOP_UNLOCK(*svpp, 0);
+			if (error != EBUSY) {
+				vrele(nvp);
+				goto out;
 			}
+			error = vn_lock(nvp, LK_EXCLUSIVE);
+			if (error != 0) {
+				vrele(nvp);
+				goto out;
+			}
+			vput(nvp);
+			goto relock;
 		}
+		*tvpp = nvp;
+	}
 
-		zl = kmem_alloc(sizeof (*zl), KM_SLEEP);
-		zl->zl_rwlock = rwlp;
-		zl->zl_znode = NULL;
-		zl->zl_next = *zlpp;
-		*zlpp = zl;
+	return (0);
 
-		if (oidp == szp->z_id)		/* We're a descendant of szp */
-			return (SET_ERROR(EINVAL));
+out:
+	return (error);
+}
 
-		if (oidp == rootid)		/* We've hit the top */
-			return (0);
+/*
+ * Note that we must use VRELE_ASYNC in this function as it walks
+ * up the directory tree and vrele may need to acquire an exclusive
+ * lock if a last reference to a vnode is dropped.
+ */
+static int
+zfs_rename_check(znode_t *szp, znode_t *sdzp, znode_t *tdzp)
+{
+	zfsvfs_t	*zfsvfs;
+	znode_t		*zp, *zp1;
+	uint64_t	parent;
+	int		error;
 
-		if (rw == RW_READER) {		/* i.e. not the first pass */
-			int error = zfs_zget(zp->z_zfsvfs, oidp, &zp);
-			if (error)
-				return (error);
-			zl->zl_znode = zp;
+	zfsvfs = tdzp->z_zfsvfs;
+	if (tdzp == szp)
+		return (SET_ERROR(EINVAL));
+	if (tdzp == sdzp)
+		return (0);
+	if (tdzp->z_id == zfsvfs->z_root)
+		return (0);
+	zp = tdzp;
+	for (;;) {
+		ASSERT(!zp->z_unlinked);
+		if ((error = sa_lookup(zp->z_sa_hdl,
+		    SA_ZPL_PARENT(zfsvfs), &parent, sizeof (parent))) != 0)
+			break;
+
+		if (parent == szp->z_id) {
+			error = SET_ERROR(EINVAL);
+			break;
 		}
-		(void) sa_lookup(zp->z_sa_hdl, SA_ZPL_PARENT(zp->z_zfsvfs),
-		    &oidp, sizeof (oidp));
-		rwlp = &zp->z_parent_lock;
-		rw = RW_READER;
+		if (parent == zfsvfs->z_root)
+			break;
+		if (parent == sdzp->z_id)
+			break;
 
-	} while (zp->z_id != sdzp->z_id);
+		error = zfs_zget(zfsvfs, parent, &zp1);
+		if (error != 0)
+			break;
 
-	return (0);
+		if (zp != tdzp)
+			VN_RELE_ASYNC(ZTOV(zp),
+			    dsl_pool_vnrele_taskq(dmu_objset_pool(zfsvfs->z_os)));
+		zp = zp1;
+	}
+
+	if (error == ENOTDIR)
+		panic("checkpath: .. not a directory\n");
+	if (zp != tdzp)
+		VN_RELE_ASYNC(ZTOV(zp),
+		    dsl_pool_vnrele_taskq(dmu_objset_pool(zfsvfs->z_os)));
+	return (error);
 }
 
 /*
  * Move an entry from the provided source directory to the target
  * directory.  Change the entry name as indicated.
  *
  *	IN:	sdvp	- Source directory containing the "old entry".
  *		snm	- Old entry name.
  *		tdvp	- Target directory to contain the "new entry".
  *		tnm	- New entry name.
  *		cr	- credentials of caller.
  *		ct	- caller context
  *		flags	- case flags
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Timestamps:
  *	sdvp,tdvp - ctime|mtime updated
  */
 /*ARGSUSED*/
 static int
-zfs_rename(vnode_t *sdvp, char *snm, vnode_t *tdvp, char *tnm, cred_t *cr,
-    caller_context_t *ct, int flags)
+zfs_rename(vnode_t *sdvp, vnode_t **svpp, struct componentname *scnp,
+    vnode_t *tdvp, vnode_t **tvpp, struct componentname *tcnp,
+    cred_t *cr)
 {
-	znode_t		*tdzp, *sdzp, *szp, *tzp;
-	zfsvfs_t 	*zfsvfs;
-	zilog_t		*zilog;
-	vnode_t		*realvp;
-	zfs_dirlock_t	*sdl, *tdl;
+	zfsvfs_t	*zfsvfs;
+	znode_t		*sdzp, *tdzp, *szp, *tzp;
+	zilog_t		*zilog = NULL;
 	dmu_tx_t	*tx;
-	zfs_zlock_t	*zl;
-	int		cmp, serr, terr;
+	char		*snm = scnp->cn_nameptr;
+	char		*tnm = tcnp->cn_nameptr;
 	int		error = 0;
-	int		zflg = 0;
-	boolean_t	waited = B_FALSE;
 
-	tdzp = VTOZ(tdvp);
-	ZFS_VERIFY_ZP(tdzp);
-	zfsvfs = tdzp->z_zfsvfs;
-	ZFS_ENTER(zfsvfs);
-	zilog = zfsvfs->z_log;
-	sdzp = VTOZ(sdvp);
+	/* Reject renames across filesystems. */
+	if ((*svpp)->v_mount != tdvp->v_mount ||
+	    ((*tvpp) != NULL && (*svpp)->v_mount != (*tvpp)->v_mount)) {
+		error = SET_ERROR(EXDEV);
+		goto out;
+	}
 
+	if (zfsctl_is_node(tdvp)) {
+		error = SET_ERROR(EXDEV);
+		goto out;
+	}
+
 	/*
-	 * In case sdzp is not valid, let's be sure to exit from the right
-	 * zfsvfs_t.
+	 * Lock all four vnodes to ensure safety and semantics of renaming.
 	 */
-	if (sdzp->z_sa_hdl == NULL) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EIO));
+	error = zfs_rename_relock(sdvp, svpp, tdvp, tvpp, scnp, tcnp);
+	if (error != 0) {
+		/* no vnodes are locked in the case of error here */
+		return (error);
 	}
 
+	tdzp = VTOZ(tdvp);
+	sdzp = VTOZ(sdvp);
+	zfsvfs = tdzp->z_zfsvfs;
+	zilog = zfsvfs->z_log;
+
 	/*
-	 * We check z_zfsvfs rather than v_vfsp here, because snapshots and the
-	 * ctldir appear to have the same v_vfsp.
+	 * After we re-enter ZFS_ENTER() we will have to revalidate all
+	 * znodes involved.
 	 */
-	if (sdzp->z_zfsvfs != zfsvfs || zfsctl_is_node(tdvp)) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EXDEV));
-	}
+	ZFS_ENTER(zfsvfs);
 
 	if (zfsvfs->z_utf8 && u8_validate(tnm,
 	    strlen(tnm), NULL, U8_VALIDATE_ENTIRE, &error) < 0) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EILSEQ));
+		error = SET_ERROR(EILSEQ);
+		goto unlockout;
 	}
 
-	if (flags & FIGNORECASE)
-		zflg |= ZCILOOK;
+	/* If source and target are the same file, there is nothing to do. */
+	if ((*svpp) == (*tvpp)) {
+		error = 0;
+		goto unlockout;
+	}
 
-top:
-	szp = NULL;
-	tzp = NULL;
-	zl = NULL;
+	if (((*svpp)->v_type == VDIR && (*svpp)->v_mountedhere != NULL) ||
+	    ((*tvpp) != NULL && (*tvpp)->v_type == VDIR &&
+	    (*tvpp)->v_mountedhere != NULL)) {
+		error = SET_ERROR(EXDEV);
+		goto unlockout;
+	}
 
 	/*
-	 * This is to prevent the creation of links into attribute space
-	 * by renaming a linked file into/outof an attribute directory.
-	 * See the comment in zfs_link() for why this is considered bad.
+	 * We can not use ZFS_VERIFY_ZP() here because it could directly return
+	 * bypassing the cleanup code in the case of an error.
 	 */
-	if ((tdzp->z_pflags & ZFS_XATTR) != (sdzp->z_pflags & ZFS_XATTR)) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EINVAL));
+	if (tdzp->z_sa_hdl == NULL || sdzp->z_sa_hdl == NULL) {
+		error = SET_ERROR(EIO);
+		goto unlockout;
 	}
 
-	/*
-	 * Lock source and target directory entries.  To prevent deadlock,
-	 * a lock ordering must be defined.  We lock the directory with
-	 * the smallest object id first, or if it's a tie, the one with
-	 * the lexically first name.
-	 */
-	if (sdzp->z_id < tdzp->z_id) {
-		cmp = -1;
-	} else if (sdzp->z_id > tdzp->z_id) {
-		cmp = 1;
-	} else {
-		/*
-		 * First compare the two name arguments without
-		 * considering any case folding.
-		 */
-		int nofold = (zfsvfs->z_norm & ~U8_TEXTPREP_TOUPPER);
-
-		cmp = u8_strcmp(snm, tnm, 0, nofold, U8_UNICODE_LATEST, &error);
-		ASSERT(error == 0 || !zfsvfs->z_utf8);
-		if (cmp == 0) {
-			/*
-			 * POSIX: "If the old argument and the new argument
-			 * both refer to links to the same existing file,
-			 * the rename() function shall return successfully
-			 * and perform no other action."
-			 */
-			ZFS_EXIT(zfsvfs);
-			return (0);
-		}
-		/*
-		 * If the file system is case-folding, then we may
-		 * have some more checking to do.  A case-folding file
-		 * system is either supporting mixed case sensitivity
-		 * access or is completely case-insensitive.  Note
-		 * that the file system is always case preserving.
-		 *
-		 * In mixed sensitivity mode case sensitive behavior
-		 * is the default.  FIGNORECASE must be used to
-		 * explicitly request case insensitive behavior.
-		 *
-		 * If the source and target names provided differ only
-		 * by case (e.g., a request to rename 'tim' to 'Tim'),
-		 * we will treat this as a special case in the
-		 * case-insensitive mode: as long as the source name
-		 * is an exact match, we will allow this to proceed as
-		 * a name-change request.
-		 */
-		if ((zfsvfs->z_case == ZFS_CASE_INSENSITIVE ||
-		    (zfsvfs->z_case == ZFS_CASE_MIXED &&
-		    flags & FIGNORECASE)) &&
-		    u8_strcmp(snm, tnm, 0, zfsvfs->z_norm, U8_UNICODE_LATEST,
-		    &error) == 0) {
-			/*
-			 * case preserving rename request, require exact
-			 * name matches
-			 */
-			zflg |= ZCIEXACT;
-			zflg &= ~ZCILOOK;
-		}
+	szp = VTOZ(*svpp);
+	tzp = *tvpp == NULL ? NULL : VTOZ(*tvpp);
+	if (szp->z_sa_hdl == NULL || (tzp != NULL && tzp->z_sa_hdl == NULL)) {
+		error = SET_ERROR(EIO);
+		goto unlockout;
 	}
 
 	/*
-	 * If the source and destination directories are the same, we should
-	 * grab the z_name_lock of that directory only once.
+	 * This is to prevent the creation of links into attribute space
+	 * by renaming a linked file into/outof an attribute directory.
+	 * See the comment in zfs_link() for why this is considered bad.
 	 */
-	if (sdzp == tdzp) {
-		zflg |= ZHAVELOCK;
-		rw_enter(&sdzp->z_name_lock, RW_READER);
+	if ((tdzp->z_pflags & ZFS_XATTR) != (sdzp->z_pflags & ZFS_XATTR)) {
+		error = SET_ERROR(EINVAL);
+		goto unlockout;
 	}
 
-	if (cmp < 0) {
-		serr = zfs_dirent_lock(&sdl, sdzp, snm, &szp,
-		    ZEXISTS | zflg, NULL, NULL);
-		terr = zfs_dirent_lock(&tdl,
-		    tdzp, tnm, &tzp, ZRENAMING | zflg, NULL, NULL);
-	} else {
-		terr = zfs_dirent_lock(&tdl,
-		    tdzp, tnm, &tzp, zflg, NULL, NULL);
-		serr = zfs_dirent_lock(&sdl,
-		    sdzp, snm, &szp, ZEXISTS | ZRENAMING | zflg,
-		    NULL, NULL);
-	}
-
-	if (serr) {
-		/*
-		 * Source entry invalid or not there.
-		 */
-		if (!terr) {
-			zfs_dirent_unlock(tdl);
-			if (tzp)
-				VN_RELE(ZTOV(tzp));
-		}
-
-		if (sdzp == tdzp)
-			rw_exit(&sdzp->z_name_lock);
-
-		/*
-		 * FreeBSD: In OpenSolaris they only check if rename source is
-		 * ".." here, because "." is handled in their lookup. This is
-		 * not the case for FreeBSD, so we check for "." explicitly.
-		 */
-		if (strcmp(snm, ".") == 0 || strcmp(snm, "..") == 0)
-			serr = SET_ERROR(EINVAL);
-		ZFS_EXIT(zfsvfs);
-		return (serr);
-	}
-	if (terr) {
-		zfs_dirent_unlock(sdl);
-		VN_RELE(ZTOV(szp));
-
-		if (sdzp == tdzp)
-			rw_exit(&sdzp->z_name_lock);
-
-		if (strcmp(tnm, "..") == 0)
-			terr = SET_ERROR(EINVAL);
-		ZFS_EXIT(zfsvfs);
-		return (terr);
-	}
-
 	/*
 	 * Must have write access at the source to remove the old entry
 	 * and write access at the target to create the new entry.
 	 * Note that if target and source are the same, this can be
 	 * done in a single check.
 	 */
-
 	if (error = zfs_zaccess_rename(sdzp, szp, tdzp, tzp, cr))
-		goto out;
+		goto unlockout;
 
-	if (ZTOV(szp)->v_type == VDIR) {
+	if ((*svpp)->v_type == VDIR) {
 		/*
+		 * Avoid ".", "..", and aliases of "." for obvious reasons.
+		 */
+		if ((scnp->cn_namelen == 1 && scnp->cn_nameptr[0] == '.') ||
+		    sdzp == szp ||
+		    (scnp->cn_flags | tcnp->cn_flags) & ISDOTDOT) {
+			error = EINVAL;
+			goto unlockout;
+		}
+
+		/*
 		 * Check to make sure rename is valid.
 		 * Can't do a move like this: /usr/a/b to /usr/a/b/c/d
 		 */
-		if (error = zfs_rename_lock(szp, tdzp, sdzp, &zl))
-			goto out;
+		if (error = zfs_rename_check(szp, sdzp, tdzp))
+			goto unlockout;
 	}
 
 	/*
 	 * Does target exist?
 	 */
 	if (tzp) {
 		/*
 		 * Source and target must be the same type.
 		 */
-		if (ZTOV(szp)->v_type == VDIR) {
-			if (ZTOV(tzp)->v_type != VDIR) {
+		if ((*svpp)->v_type == VDIR) {
+			if ((*tvpp)->v_type != VDIR) {
 				error = SET_ERROR(ENOTDIR);
-				goto out;
+				goto unlockout;
+			} else {
+				cache_purge(tdvp);
+				if (sdvp != tdvp)
+					cache_purge(sdvp);
 			}
 		} else {
-			if (ZTOV(tzp)->v_type == VDIR) {
+			if ((*tvpp)->v_type == VDIR) {
 				error = SET_ERROR(EISDIR);
-				goto out;
+				goto unlockout;
 			}
 		}
-		/*
-		 * POSIX dictates that when the source and target
-		 * entries refer to the same file object, rename
-		 * must do nothing and exit without error.
-		 */
-		if (szp->z_id == tzp->z_id) {
-			error = 0;
-			goto out;
-		}
 	}
 
-	vnevent_rename_src(ZTOV(szp), sdvp, snm, ct);
+	vnevent_rename_src(*svpp, sdvp, scnp->cn_nameptr, ct);
 	if (tzp)
-		vnevent_rename_dest(ZTOV(tzp), tdvp, tnm, ct);
+		vnevent_rename_dest(*tvpp, tdvp, tnm, ct);
 
 	/*
 	 * notify the target directory if it is not the same
 	 * as source directory.
 	 */
 	if (tdvp != sdvp) {
 		vnevent_rename_dest_dir(tdvp, ct);
 	}
 
 	tx = dmu_tx_create(zfsvfs->z_os);
 	dmu_tx_hold_sa(tx, szp->z_sa_hdl, B_FALSE);
 	dmu_tx_hold_sa(tx, sdzp->z_sa_hdl, B_FALSE);
 	dmu_tx_hold_zap(tx, sdzp->z_id, FALSE, snm);
 	dmu_tx_hold_zap(tx, tdzp->z_id, TRUE, tnm);
 	if (sdzp != tdzp) {
 		dmu_tx_hold_sa(tx, tdzp->z_sa_hdl, B_FALSE);
 		zfs_sa_upgrade_txholds(tx, tdzp);
 	}
 	if (tzp) {
 		dmu_tx_hold_sa(tx, tzp->z_sa_hdl, B_FALSE);
 		zfs_sa_upgrade_txholds(tx, tzp);
 	}
 
 	zfs_sa_upgrade_txholds(tx, szp);
 	dmu_tx_hold_zap(tx, zfsvfs->z_unlinkedobj, FALSE, NULL);
-	error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT);
+	error = dmu_tx_assign(tx, TXG_WAIT);
 	if (error) {
-		if (zl != NULL)
-			zfs_rename_unlock(&zl);
-		zfs_dirent_unlock(sdl);
-		zfs_dirent_unlock(tdl);
-
-		if (sdzp == tdzp)
-			rw_exit(&sdzp->z_name_lock);
-
-		VN_RELE(ZTOV(szp));
-		if (tzp)
-			VN_RELE(ZTOV(tzp));
-		if (error == ERESTART) {
-			waited = B_TRUE;
-			dmu_tx_wait(tx);
-			dmu_tx_abort(tx);
-			goto top;
-		}
 		dmu_tx_abort(tx);
-		ZFS_EXIT(zfsvfs);
-		return (error);
+		goto unlockout;
 	}
 
+
 	if (tzp)	/* Attempt to remove the existing target */
-		error = zfs_link_destroy(tdl, tzp, tx, zflg, NULL);
+		error = zfs_link_destroy(tdzp, tnm, tzp, tx, 0, NULL);
 
 	if (error == 0) {
-		error = zfs_link_create(tdl, szp, tx, ZRENAMING);
+		error = zfs_link_create(tdzp, tnm, szp, tx, ZRENAMING);
 		if (error == 0) {
 			szp->z_pflags |= ZFS_AV_MODIFIED;
 
 			error = sa_update(szp->z_sa_hdl, SA_ZPL_FLAGS(zfsvfs),
 			    (void *)&szp->z_pflags, sizeof (uint64_t), tx);
 			ASSERT0(error);
 
-			error = zfs_link_destroy(sdl, szp, tx, ZRENAMING, NULL);
+			error = zfs_link_destroy(sdzp, snm, szp, tx, ZRENAMING,
+			    NULL);
 			if (error == 0) {
-				zfs_log_rename(zilog, tx, TX_RENAME |
-				    (flags & FIGNORECASE ? TX_CI : 0), sdzp,
-				    sdl->dl_name, tdzp, tdl->dl_name, szp);
+				zfs_log_rename(zilog, tx, TX_RENAME, sdzp,
+				    snm, tdzp, tnm, szp);
 
 				/*
 				 * Update path information for the target vnode
 				 */
-				vn_renamepath(tdvp, ZTOV(szp), tnm,
-				    strlen(tnm));
+				vn_renamepath(tdvp, *svpp, tnm, strlen(tnm));
 			} else {
 				/*
 				 * At this point, we have successfully created
 				 * the target name, but have failed to remove
 				 * the source name.  Since the create was done
 				 * with the ZRENAMING flag, there are
 				 * complications; for one, the link count is
 				 * wrong.  The easiest way to deal with this
 				 * is to remove the newly created target, and
 				 * return the original error.  This must
 				 * succeed; fortunately, it is very unlikely to
 				 * fail, since we just created it.
 				 */
-				VERIFY3U(zfs_link_destroy(tdl, szp, tx,
+				VERIFY3U(zfs_link_destroy(tdzp, tnm, szp, tx,
 				    ZRENAMING, NULL), ==, 0);
 			}
 		}
-#ifdef FREEBSD_NAMECACHE
 		if (error == 0) {
-			cache_purge(sdvp);
-			cache_purge(tdvp);
-			cache_purge(ZTOV(szp));
-			if (tzp)
-				cache_purge(ZTOV(tzp));
+			cache_purge(*svpp);
+			if (*tvpp != NULL)
+				cache_purge(*tvpp);
+			cache_purge_negative(tdvp);
 		}
-#endif
 	}
 
 	dmu_tx_commit(tx);
-out:
-	if (zl != NULL)
-		zfs_rename_unlock(&zl);
 
-	zfs_dirent_unlock(sdl);
-	zfs_dirent_unlock(tdl);
+unlockout:			/* all 4 vnodes are locked, ZFS_ENTER called */
+	ZFS_EXIT(zfsvfs);
+	VOP_UNLOCK(*svpp, 0);
+	VOP_UNLOCK(sdvp, 0);
 
-	if (sdzp == tdzp)
-		rw_exit(&sdzp->z_name_lock);
-
-
-	VN_RELE(ZTOV(szp));
-	if (tzp)
-		VN_RELE(ZTOV(tzp));
-
-	if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)
+out:				/* original two vnodes are locked */
+	if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS && error == 0)
 		zil_commit(zilog, 0);
 
-	ZFS_EXIT(zfsvfs);
-
+	if (*tvpp != NULL)
+		VOP_UNLOCK(*tvpp, 0);
+	if (tdvp != *tvpp)
+		VOP_UNLOCK(tdvp, 0);
 	return (error);
 }
 
 /*
  * Insert the indicated symbolic reference entry into the directory.
  *
  *	IN:	dvp	- Directory to contain new symbolic link.
  *		link	- Name for new symlink entry.
  *		vap	- Attributes of new entry.
  *		cr	- credentials of caller.
  *		ct	- caller context
  *		flags	- case flags
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Timestamps:
  *	dvp - ctime|mtime updated
  */
 /*ARGSUSED*/
 static int
 zfs_symlink(vnode_t *dvp, vnode_t **vpp, char *name, vattr_t *vap, char *link,
     cred_t *cr, kthread_t *td)
 {
 	znode_t		*zp, *dzp = VTOZ(dvp);
-	zfs_dirlock_t	*dl;
 	dmu_tx_t	*tx;
 	zfsvfs_t	*zfsvfs = dzp->z_zfsvfs;
 	zilog_t		*zilog;
 	uint64_t	len = strlen(link);
 	int		error;
-	int		zflg = ZNEW;
 	zfs_acl_ids_t	acl_ids;
 	boolean_t	fuid_dirtied;
 	uint64_t	txtype = TX_SYMLINK;
-	boolean_t	waited = B_FALSE;
 	int		flags = 0;
 
 	ASSERT(vap->va_type == VLNK);
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(dzp);
 	zilog = zfsvfs->z_log;
 
 	if (zfsvfs->z_utf8 && u8_validate(name, strlen(name),
 	    NULL, U8_VALIDATE_ENTIRE, &error) < 0) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EILSEQ));
 	}
-	if (flags & FIGNORECASE)
-		zflg |= ZCILOOK;
 
 	if (len > MAXPATHLEN) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(ENAMETOOLONG));
 	}
 
 	if ((error = zfs_acl_ids_create(dzp, 0,
 	    vap, cr, NULL, &acl_ids)) != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
-	getnewvnode_reserve(1);
-
-top:
 	/*
 	 * Attempt to lock directory; fail if entry already exists.
 	 */
-	error = zfs_dirent_lock(&dl, dzp, name, &zp, zflg, NULL, NULL);
+	error = zfs_dirent_lookup(dzp, name, &zp, ZNEW);
 	if (error) {
 		zfs_acl_ids_free(&acl_ids);
-		getnewvnode_drop_reserve();
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	if (error = zfs_zaccess(dzp, ACE_ADD_FILE, 0, B_FALSE, cr)) {
 		zfs_acl_ids_free(&acl_ids);
-		zfs_dirent_unlock(dl);
-		getnewvnode_drop_reserve();
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	if (zfs_acl_ids_overquota(zfsvfs, &acl_ids)) {
 		zfs_acl_ids_free(&acl_ids);
-		zfs_dirent_unlock(dl);
-		getnewvnode_drop_reserve();
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EDQUOT));
 	}
+
+	getnewvnode_reserve(1);
 	tx = dmu_tx_create(zfsvfs->z_os);
 	fuid_dirtied = zfsvfs->z_fuid_dirty;
 	dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0, MAX(1, len));
 	dmu_tx_hold_zap(tx, dzp->z_id, TRUE, name);
 	dmu_tx_hold_sa_create(tx, acl_ids.z_aclp->z_acl_bytes +
 	    ZFS_SA_BASE_ATTR_SIZE + len);
 	dmu_tx_hold_sa(tx, dzp->z_sa_hdl, B_FALSE);
 	if (!zfsvfs->z_use_sa && acl_ids.z_aclp->z_acl_bytes > ZFS_ACE_SPACE) {
 		dmu_tx_hold_write(tx, DMU_NEW_OBJECT, 0,
 		    acl_ids.z_aclp->z_acl_bytes);
 	}
 	if (fuid_dirtied)
 		zfs_fuid_txhold(zfsvfs, tx);
-	error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT);
+	error = dmu_tx_assign(tx, TXG_WAIT);
 	if (error) {
-		zfs_dirent_unlock(dl);
-		if (error == ERESTART) {
-			waited = B_TRUE;
-			dmu_tx_wait(tx);
-			dmu_tx_abort(tx);
-			goto top;
-		}
 		zfs_acl_ids_free(&acl_ids);
 		dmu_tx_abort(tx);
 		getnewvnode_drop_reserve();
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	/*
 	 * Create a new object for the symlink.
 	 * for version 4 ZPL datsets the symlink will be an SA attribute
 	 */
 	zfs_mknode(dzp, vap, tx, cr, 0, &zp, &acl_ids);
 
 	if (fuid_dirtied)
 		zfs_fuid_sync(zfsvfs, tx);
 
-	mutex_enter(&zp->z_lock);
 	if (zp->z_is_sa)
 		error = sa_update(zp->z_sa_hdl, SA_ZPL_SYMLINK(zfsvfs),
 		    link, len, tx);
 	else
 		zfs_sa_symlink(zp, link, len, tx);
-	mutex_exit(&zp->z_lock);
 
 	zp->z_size = len;
 	(void) sa_update(zp->z_sa_hdl, SA_ZPL_SIZE(zfsvfs),
 	    &zp->z_size, sizeof (zp->z_size), tx);
 	/*
 	 * Insert the new object into the directory.
 	 */
-	(void) zfs_link_create(dl, zp, tx, ZNEW);
+	(void) zfs_link_create(dzp, name, zp, tx, ZNEW);
 
-	if (flags & FIGNORECASE)
-		txtype |= TX_CI;
 	zfs_log_symlink(zilog, tx, txtype, dzp, zp, name, link);
 	*vpp = ZTOV(zp);
 
 	zfs_acl_ids_free(&acl_ids);
 
 	dmu_tx_commit(tx);
 
 	getnewvnode_drop_reserve();
 
-	zfs_dirent_unlock(dl);
-
 	if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)
 		zil_commit(zilog, 0);
 
 	ZFS_EXIT(zfsvfs);
 	return (error);
 }
 
 /*
  * Return, in the buffer contained in the provided uio structure,
  * the symbolic path referred to by vp.
  *
  *	IN:	vp	- vnode of symbolic link.
  *		uio	- structure to contain the link path.
  *		cr	- credentials of caller.
  *		ct	- caller context
  *
  *	OUT:	uio	- structure containing the link path.
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Timestamps:
  *	vp - atime updated
  */
 /* ARGSUSED */
 static int
 zfs_readlink(vnode_t *vp, uio_t *uio, cred_t *cr, caller_context_t *ct)
 {
 	znode_t		*zp = VTOZ(vp);
 	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
 	int		error;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
-	mutex_enter(&zp->z_lock);
 	if (zp->z_is_sa)
 		error = sa_lookup_uio(zp->z_sa_hdl,
 		    SA_ZPL_SYMLINK(zfsvfs), uio);
 	else
 		error = zfs_sa_readlink(zp, uio);
-	mutex_exit(&zp->z_lock);
 
 	ZFS_ACCESSTIME_STAMP(zfsvfs, zp);
 
 	ZFS_EXIT(zfsvfs);
 	return (error);
 }
 
 /*
  * Insert a new entry into directory tdvp referencing svp.
  *
  *	IN:	tdvp	- Directory to contain new entry.
  *		svp	- vnode of new entry.
  *		name	- name of new entry.
  *		cr	- credentials of caller.
  *		ct	- caller context
  *
  *	RETURN:	0 on success, error code on failure.
  *
  * Timestamps:
  *	tdvp - ctime|mtime updated
  *	 svp - ctime updated
  */
 /* ARGSUSED */
 static int
 zfs_link(vnode_t *tdvp, vnode_t *svp, char *name, cred_t *cr,
     caller_context_t *ct, int flags)
 {
 	znode_t		*dzp = VTOZ(tdvp);
 	znode_t		*tzp, *szp;
 	zfsvfs_t	*zfsvfs = dzp->z_zfsvfs;
 	zilog_t		*zilog;
-	zfs_dirlock_t	*dl;
 	dmu_tx_t	*tx;
-	vnode_t		*realvp;
 	int		error;
-	int		zf = ZNEW;
 	uint64_t	parent;
 	uid_t		owner;
-	boolean_t	waited = B_FALSE;
 
 	ASSERT(tdvp->v_type == VDIR);
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(dzp);
 	zilog = zfsvfs->z_log;
 
-	if (VOP_REALVP(svp, &realvp, ct) == 0)
-		svp = realvp;
-
 	/*
 	 * POSIX dictates that we return EPERM here.
 	 * Better choices include ENOTSUP or EISDIR.
 	 */
 	if (svp->v_type == VDIR) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EPERM));
 	}
 
 	szp = VTOZ(svp);
 	ZFS_VERIFY_ZP(szp);
 
 	if (szp->z_pflags & (ZFS_APPENDONLY | ZFS_IMMUTABLE | ZFS_READONLY)) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EPERM));
 	}
 
-	/*
-	 * We check z_zfsvfs rather than v_vfsp here, because snapshots and the
-	 * ctldir appear to have the same v_vfsp.
-	 */
-	if (szp->z_zfsvfs != zfsvfs || zfsctl_is_node(svp)) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EXDEV));
-	}
-
 	/* Prevent links to .zfs/shares files */
 
 	if ((error = sa_lookup(szp->z_sa_hdl, SA_ZPL_PARENT(zfsvfs),
 	    &parent, sizeof (uint64_t))) != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 	if (parent == zfsvfs->z_shares_dir) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EPERM));
 	}
 
 	if (zfsvfs->z_utf8 && u8_validate(name,
 	    strlen(name), NULL, U8_VALIDATE_ENTIRE, &error) < 0) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EILSEQ));
 	}
-	if (flags & FIGNORECASE)
-		zf |= ZCILOOK;
 
 	/*
 	 * We do not support links between attributes and non-attributes
 	 * because of the potential security risk of creating links
 	 * into "normal" file space in order to circumvent restrictions
 	 * imposed in attribute space.
 	 */
 	if ((szp->z_pflags & ZFS_XATTR) != (dzp->z_pflags & ZFS_XATTR)) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EINVAL));
 	}
 
 
 	owner = zfs_fuid_map_id(zfsvfs, szp->z_uid, cr, ZFS_OWNER);
 	if (owner != crgetuid(cr) && secpolicy_basic_link(svp, cr) != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(EPERM));
 	}
 
 	if (error = zfs_zaccess(dzp, ACE_ADD_FILE, 0, B_FALSE, cr)) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
-top:
 	/*
 	 * Attempt to lock directory; fail if entry already exists.
 	 */
-	error = zfs_dirent_lock(&dl, dzp, name, &tzp, zf, NULL, NULL);
+	error = zfs_dirent_lookup(dzp, name, &tzp, ZNEW);
 	if (error) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	tx = dmu_tx_create(zfsvfs->z_os);
 	dmu_tx_hold_sa(tx, szp->z_sa_hdl, B_FALSE);
 	dmu_tx_hold_zap(tx, dzp->z_id, TRUE, name);
 	zfs_sa_upgrade_txholds(tx, szp);
 	zfs_sa_upgrade_txholds(tx, dzp);
-	error = dmu_tx_assign(tx, waited ? TXG_WAITED : TXG_NOWAIT);
+	error = dmu_tx_assign(tx, TXG_WAIT);
 	if (error) {
-		zfs_dirent_unlock(dl);
-		if (error == ERESTART) {
-			waited = B_TRUE;
-			dmu_tx_wait(tx);
-			dmu_tx_abort(tx);
-			goto top;
-		}
 		dmu_tx_abort(tx);
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
-	error = zfs_link_create(dl, szp, tx, 0);
+	error = zfs_link_create(dzp, name, szp, tx, 0);
 
 	if (error == 0) {
 		uint64_t txtype = TX_LINK;
-		if (flags & FIGNORECASE)
-			txtype |= TX_CI;
 		zfs_log_link(zilog, tx, txtype, dzp, szp, name);
 	}
 
 	dmu_tx_commit(tx);
 
-	zfs_dirent_unlock(dl);
-
 	if (error == 0) {
 		vnevent_link(svp, ct);
 	}
 
 	if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)
 		zil_commit(zilog, 0);
 
 	ZFS_EXIT(zfsvfs);
 	return (error);
 }
 
-#ifdef illumos
-/*
- * zfs_null_putapage() is used when the file system has been force
- * unmounted. It just drops the pages.
- */
-/* ARGSUSED */
-static int
-zfs_null_putapage(vnode_t *vp, page_t *pp, u_offset_t *offp,
-    size_t *lenp, int flags, cred_t *cr)
-{
-	pvn_write_done(pp, B_INVAL|B_FORCE|B_ERROR);
-	return (0);
-}
 
-/*
- * Push a page out to disk, klustering if possible.
- *
- *	IN:	vp	- file to push page to.
- *		pp	- page to push.
- *		flags	- additional flags.
- *		cr	- credentials of caller.
- *
- *	OUT:	offp	- start of range pushed.
- *		lenp	- len of range pushed.
- *
- *	RETURN:	0 on success, error code on failure.
- *
- * NOTE: callers must have locked the page to be pushed.  On
- * exit, the page (and all other pages in the kluster) must be
- * unlocked.
- */
-/* ARGSUSED */
-static int
-zfs_putapage(vnode_t *vp, page_t *pp, u_offset_t *offp,
-    size_t *lenp, int flags, cred_t *cr)
-{
-	znode_t		*zp = VTOZ(vp);
-	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
-	dmu_tx_t	*tx;
-	u_offset_t	off, koff;
-	size_t		len, klen;
-	int		err;
-
-	off = pp->p_offset;
-	len = PAGESIZE;
-	/*
-	 * If our blocksize is bigger than the page size, try to kluster
-	 * multiple pages so that we write a full block (thus avoiding
-	 * a read-modify-write).
-	 */
-	if (off < zp->z_size && zp->z_blksz > PAGESIZE) {
-		klen = P2ROUNDUP((ulong_t)zp->z_blksz, PAGESIZE);
-		koff = ISP2(klen) ? P2ALIGN(off, (u_offset_t)klen) : 0;
-		ASSERT(koff <= zp->z_size);
-		if (koff + klen > zp->z_size)
-			klen = P2ROUNDUP(zp->z_size - koff, (uint64_t)PAGESIZE);
-		pp = pvn_write_kluster(vp, pp, &off, &len, koff, klen, flags);
-	}
-	ASSERT3U(btop(len), ==, btopr(len));
-
-	/*
-	 * Can't push pages past end-of-file.
-	 */
-	if (off >= zp->z_size) {
-		/* ignore all pages */
-		err = 0;
-		goto out;
-	} else if (off + len > zp->z_size) {
-		int npages = btopr(zp->z_size - off);
-		page_t *trunc;
-
-		page_list_break(&pp, &trunc, npages);
-		/* ignore pages past end of file */
-		if (trunc)
-			pvn_write_done(trunc, flags);
-		len = zp->z_size - off;
-	}
-
-	if (zfs_owner_overquota(zfsvfs, zp, B_FALSE) ||
-	    zfs_owner_overquota(zfsvfs, zp, B_TRUE)) {
-		err = SET_ERROR(EDQUOT);
-		goto out;
-	}
-	tx = dmu_tx_create(zfsvfs->z_os);
-	dmu_tx_hold_write(tx, zp->z_id, off, len);
-
-	dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
-	zfs_sa_upgrade_txholds(tx, zp);
-	err = dmu_tx_assign(tx, TXG_WAIT);
-	if (err != 0) {
-		dmu_tx_abort(tx);
-		goto out;
-	}
-
-	if (zp->z_blksz <= PAGESIZE) {
-		caddr_t va = zfs_map_page(pp, S_READ);
-		ASSERT3U(len, <=, PAGESIZE);
-		dmu_write(zfsvfs->z_os, zp->z_id, off, len, va, tx);
-		zfs_unmap_page(pp, va);
-	} else {
-		err = dmu_write_pages(zfsvfs->z_os, zp->z_id, off, len, pp, tx);
-	}
-
-	if (err == 0) {
-		uint64_t mtime[2], ctime[2];
-		sa_bulk_attr_t bulk[3];
-		int count = 0;
-
-		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL,
-		    &mtime, 16);
-		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL,
-		    &ctime, 16);
-		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL,
-		    &zp->z_pflags, 8);
-		zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime,
-		    B_TRUE);
-		zfs_log_write(zfsvfs->z_log, tx, TX_WRITE, zp, off, len, 0);
-	}
-	dmu_tx_commit(tx);
-
-out:
-	pvn_write_done(pp, (err ? B_ERROR : 0) | flags);
-	if (offp)
-		*offp = off;
-	if (lenp)
-		*lenp = len;
-
-	return (err);
-}
-
-/*
- * Copy the portion of the file indicated from pages into the file.
- * The pages are stored in a page list attached to the files vnode.
- *
- *	IN:	vp	- vnode of file to push page data to.
- *		off	- position in file to put data.
- *		len	- amount of data to write.
- *		flags	- flags to control the operation.
- *		cr	- credentials of caller.
- *		ct	- caller context.
- *
- *	RETURN:	0 on success, error code on failure.
- *
- * Timestamps:
- *	vp - ctime|mtime updated
- */
 /*ARGSUSED*/
-static int
-zfs_putpage(vnode_t *vp, offset_t off, size_t len, int flags, cred_t *cr,
-    caller_context_t *ct)
-{
-	znode_t		*zp = VTOZ(vp);
-	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
-	page_t		*pp;
-	size_t		io_len;
-	u_offset_t	io_off;
-	uint_t		blksz;
-	rl_t		*rl;
-	int		error = 0;
-
-	ZFS_ENTER(zfsvfs);
-	ZFS_VERIFY_ZP(zp);
-
-	/*
-	 * Align this request to the file block size in case we kluster.
-	 * XXX - this can result in pretty aggresive locking, which can
-	 * impact simultanious read/write access.  One option might be
-	 * to break up long requests (len == 0) into block-by-block
-	 * operations to get narrower locking.
-	 */
-	blksz = zp->z_blksz;
-	if (ISP2(blksz))
-		io_off = P2ALIGN_TYPED(off, blksz, u_offset_t);
-	else
-		io_off = 0;
-	if (len > 0 && ISP2(blksz))
-		io_len = P2ROUNDUP_TYPED(len + (off - io_off), blksz, size_t);
-	else
-		io_len = 0;
-
-	if (io_len == 0) {
-		/*
-		 * Search the entire vp list for pages >= io_off.
-		 */
-		rl = zfs_range_lock(zp, io_off, UINT64_MAX, RL_WRITER);
-		error = pvn_vplist_dirty(vp, io_off, zfs_putapage, flags, cr);
-		goto out;
-	}
-	rl = zfs_range_lock(zp, io_off, io_len, RL_WRITER);
-
-	if (off > zp->z_size) {
-		/* past end of file */
-		zfs_range_unlock(rl);
-		ZFS_EXIT(zfsvfs);
-		return (0);
-	}
-
-	len = MIN(io_len, P2ROUNDUP(zp->z_size, PAGESIZE) - io_off);
-
-	for (off = io_off; io_off < off + len; io_off += io_len) {
-		if ((flags & B_INVAL) || ((flags & B_ASYNC) == 0)) {
-			pp = page_lookup(vp, io_off,
-			    (flags & (B_INVAL | B_FREE)) ? SE_EXCL : SE_SHARED);
-		} else {
-			pp = page_lookup_nowait(vp, io_off,
-			    (flags & B_FREE) ? SE_EXCL : SE_SHARED);
-		}
-
-		if (pp != NULL && pvn_getdirty(pp, flags)) {
-			int err;
-
-			/*
-			 * Found a dirty page to push
-			 */
-			err = zfs_putapage(vp, pp, &io_off, &io_len, flags, cr);
-			if (err)
-				error = err;
-		} else {
-			io_len = PAGESIZE;
-		}
-	}
-out:
-	zfs_range_unlock(rl);
-	if ((flags & B_ASYNC) == 0 || zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)
-		zil_commit(zfsvfs->z_log, zp->z_id);
-	ZFS_EXIT(zfsvfs);
-	return (error);
-}
-#endif	/* illumos */
-
-/*ARGSUSED*/
 void
 zfs_inactive(vnode_t *vp, cred_t *cr, caller_context_t *ct)
 {
 	znode_t	*zp = VTOZ(vp);
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	int error;
 
 	rw_enter(&zfsvfs->z_teardown_inactive_lock, RW_READER);
 	if (zp->z_sa_hdl == NULL) {
 		/*
 		 * The fs has been unmounted, or we did a
 		 * suspend/resume and this file no longer exists.
 		 */
 		rw_exit(&zfsvfs->z_teardown_inactive_lock);
 		vrecycle(vp);
 		return;
 	}
 
-	mutex_enter(&zp->z_lock);
 	if (zp->z_unlinked) {
 		/*
 		 * Fast path to recycle a vnode of a removed file.
 		 */
-		mutex_exit(&zp->z_lock);
 		rw_exit(&zfsvfs->z_teardown_inactive_lock);
 		vrecycle(vp);
 		return;
 	}
-	mutex_exit(&zp->z_lock);
 
 	if (zp->z_atime_dirty && zp->z_unlinked == 0) {
 		dmu_tx_t *tx = dmu_tx_create(zfsvfs->z_os);
 
 		dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
 		zfs_sa_upgrade_txholds(tx, zp);
 		error = dmu_tx_assign(tx, TXG_WAIT);
 		if (error) {
 			dmu_tx_abort(tx);
 		} else {
-			mutex_enter(&zp->z_lock);
 			(void) sa_update(zp->z_sa_hdl, SA_ZPL_ATIME(zfsvfs),
 			    (void *)&zp->z_atime, sizeof (zp->z_atime), tx);
 			zp->z_atime_dirty = 0;
-			mutex_exit(&zp->z_lock);
 			dmu_tx_commit(tx);
 		}
 	}
 	rw_exit(&zfsvfs->z_teardown_inactive_lock);
 }
 
-#ifdef illumos
-/*
- * Bounds-check the seek operation.
- *
- *	IN:	vp	- vnode seeking within
- *		ooff	- old file offset
- *		noffp	- pointer to new file offset
- *		ct	- caller context
- *
- *	RETURN:	0 on success, EINVAL if new offset invalid.
- */
-/* ARGSUSED */
-static int
-zfs_seek(vnode_t *vp, offset_t ooff, offset_t *noffp,
-    caller_context_t *ct)
-{
-	if (vp->v_type == VDIR)
-		return (0);
-	return ((*noffp < 0 || *noffp > MAXOFFSET_T) ? EINVAL : 0);
-}
 
-/*
- * Pre-filter the generic locking function to trap attempts to place
- * a mandatory lock on a memory mapped file.
- */
-static int
-zfs_frlock(vnode_t *vp, int cmd, flock64_t *bfp, int flag, offset_t offset,
-    flk_callback_t *flk_cbp, cred_t *cr, caller_context_t *ct)
-{
-	znode_t *zp = VTOZ(vp);
-	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
-
-	ZFS_ENTER(zfsvfs);
-	ZFS_VERIFY_ZP(zp);
-
-	/*
-	 * We are following the UFS semantics with respect to mapcnt
-	 * here: If we see that the file is mapped already, then we will
-	 * return an error, but we don't worry about races between this
-	 * function and zfs_map().
-	 */
-	if (zp->z_mapcnt > 0 && MANDMODE(zp->z_mode)) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EAGAIN));
-	}
-	ZFS_EXIT(zfsvfs);
-	return (fs_frlock(vp, cmd, bfp, flag, offset, flk_cbp, cr, ct));
-}
-
-/*
- * If we can't find a page in the cache, we will create a new page
- * and fill it with file data.  For efficiency, we may try to fill
- * multiple pages at once (klustering) to fill up the supplied page
- * list.  Note that the pages to be filled are held with an exclusive
- * lock to prevent access by other threads while they are being filled.
- */
-static int
-zfs_fillpage(vnode_t *vp, u_offset_t off, struct seg *seg,
-    caddr_t addr, page_t *pl[], size_t plsz, enum seg_rw rw)
-{
-	znode_t *zp = VTOZ(vp);
-	page_t *pp, *cur_pp;
-	objset_t *os = zp->z_zfsvfs->z_os;
-	u_offset_t io_off, total;
-	size_t io_len;
-	int err;
-
-	if (plsz == PAGESIZE || zp->z_blksz <= PAGESIZE) {
-		/*
-		 * We only have a single page, don't bother klustering
-		 */
-		io_off = off;
-		io_len = PAGESIZE;
-		pp = page_create_va(vp, io_off, io_len,
-		    PG_EXCL | PG_WAIT, seg, addr);
-	} else {
-		/*
-		 * Try to find enough pages to fill the page list
-		 */
-		pp = pvn_read_kluster(vp, off, seg, addr, &io_off,
-		    &io_len, off, plsz, 0);
-	}
-	if (pp == NULL) {
-		/*
-		 * The page already exists, nothing to do here.
-		 */
-		*pl = NULL;
-		return (0);
-	}
-
-	/*
-	 * Fill the pages in the kluster.
-	 */
-	cur_pp = pp;
-	for (total = io_off + io_len; io_off < total; io_off += PAGESIZE) {
-		caddr_t va;
-
-		ASSERT3U(io_off, ==, cur_pp->p_offset);
-		va = zfs_map_page(cur_pp, S_WRITE);
-		err = dmu_read(os, zp->z_id, io_off, PAGESIZE, va,
-		    DMU_READ_PREFETCH);
-		zfs_unmap_page(cur_pp, va);
-		if (err) {
-			/* On error, toss the entire kluster */
-			pvn_read_done(pp, B_ERROR);
-			/* convert checksum errors into IO errors */
-			if (err == ECKSUM)
-				err = SET_ERROR(EIO);
-			return (err);
-		}
-		cur_pp = cur_pp->p_next;
-	}
-
-	/*
-	 * Fill in the page list array from the kluster starting
-	 * from the desired offset `off'.
-	 * NOTE: the page list will always be null terminated.
-	 */
-	pvn_plist_init(pp, pl, plsz, off, io_len, rw);
-	ASSERT(pl == NULL || (*pl)->p_offset == off);
-
-	return (0);
-}
-
-/*
- * Return pointers to the pages for the file region [off, off + len]
- * in the pl array.  If plsz is greater than len, this function may
- * also return page pointers from after the specified region
- * (i.e. the region [off, off + plsz]).  These additional pages are
- * only returned if they are already in the cache, or were created as
- * part of a klustered read.
- *
- *	IN:	vp	- vnode of file to get data from.
- *		off	- position in file to get data from.
- *		len	- amount of data to retrieve.
- *		plsz	- length of provided page list.
- *		seg	- segment to obtain pages for.
- *		addr	- virtual address of fault.
- *		rw	- mode of created pages.
- *		cr	- credentials of caller.
- *		ct	- caller context.
- *
- *	OUT:	protp	- protection mode of created pages.
- *		pl	- list of pages created.
- *
- *	RETURN:	0 on success, error code on failure.
- *
- * Timestamps:
- *	vp - atime updated
- */
-/* ARGSUSED */
-static int
-zfs_getpage(vnode_t *vp, offset_t off, size_t len, uint_t *protp,
-    page_t *pl[], size_t plsz, struct seg *seg, caddr_t addr,
-    enum seg_rw rw, cred_t *cr, caller_context_t *ct)
-{
-	znode_t		*zp = VTOZ(vp);
-	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
-	page_t		**pl0 = pl;
-	int		err = 0;
-
-	/* we do our own caching, faultahead is unnecessary */
-	if (pl == NULL)
-		return (0);
-	else if (len > plsz)
-		len = plsz;
-	else
-		len = P2ROUNDUP(len, PAGESIZE);
-	ASSERT(plsz >= len);
-
-	ZFS_ENTER(zfsvfs);
-	ZFS_VERIFY_ZP(zp);
-
-	if (protp)
-		*protp = PROT_ALL;
-
-	/*
-	 * Loop through the requested range [off, off + len) looking
-	 * for pages.  If we don't find a page, we will need to create
-	 * a new page and fill it with data from the file.
-	 */
-	while (len > 0) {
-		if (*pl = page_lookup(vp, off, SE_SHARED))
-			*(pl+1) = NULL;
-		else if (err = zfs_fillpage(vp, off, seg, addr, pl, plsz, rw))
-			goto out;
-		while (*pl) {
-			ASSERT3U((*pl)->p_offset, ==, off);
-			off += PAGESIZE;
-			addr += PAGESIZE;
-			if (len > 0) {
-				ASSERT3U(len, >=, PAGESIZE);
-				len -= PAGESIZE;
-			}
-			ASSERT3U(plsz, >=, PAGESIZE);
-			plsz -= PAGESIZE;
-			pl++;
-		}
-	}
-
-	/*
-	 * Fill out the page array with any pages already in the cache.
-	 */
-	while (plsz > 0 &&
-	    (*pl++ = page_lookup_nowait(vp, off, SE_SHARED))) {
-			off += PAGESIZE;
-			plsz -= PAGESIZE;
-	}
-out:
-	if (err) {
-		/*
-		 * Release any pages we have previously locked.
-		 */
-		while (pl > pl0)
-			page_unlock(*--pl);
-	} else {
-		ZFS_ACCESSTIME_STAMP(zfsvfs, zp);
-	}
-
-	*pl = NULL;
-
-	ZFS_EXIT(zfsvfs);
-	return (err);
-}
-
-/*
- * Request a memory map for a section of a file.  This code interacts
- * with common code and the VM system as follows:
- *
- * - common code calls mmap(), which ends up in smmap_common()
- * - this calls VOP_MAP(), which takes you into (say) zfs
- * - zfs_map() calls as_map(), passing segvn_create() as the callback
- * - segvn_create() creates the new segment and calls VOP_ADDMAP()
- * - zfs_addmap() updates z_mapcnt
- */
-/*ARGSUSED*/
-static int
-zfs_map(vnode_t *vp, offset_t off, struct as *as, caddr_t *addrp,
-    size_t len, uchar_t prot, uchar_t maxprot, uint_t flags, cred_t *cr,
-    caller_context_t *ct)
-{
-	znode_t *zp = VTOZ(vp);
-	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
-	segvn_crargs_t	vn_a;
-	int		error;
-
-	ZFS_ENTER(zfsvfs);
-	ZFS_VERIFY_ZP(zp);
-
-	if ((prot & PROT_WRITE) && (zp->z_pflags &
-	    (ZFS_IMMUTABLE | ZFS_READONLY | ZFS_APPENDONLY))) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EPERM));
-	}
-
-	if ((prot & (PROT_READ | PROT_EXEC)) &&
-	    (zp->z_pflags & ZFS_AV_QUARANTINED)) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EACCES));
-	}
-
-	if (vp->v_flag & VNOMAP) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(ENOSYS));
-	}
-
-	if (off < 0 || len > MAXOFFSET_T - off) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(ENXIO));
-	}
-
-	if (vp->v_type != VREG) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(ENODEV));
-	}
-
-	/*
-	 * If file is locked, disallow mapping.
-	 */
-	if (MANDMODE(zp->z_mode) && vn_has_flocks(vp)) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EAGAIN));
-	}
-
-	as_rangelock(as);
-	error = choose_addr(as, addrp, len, off, ADDR_VACALIGN, flags);
-	if (error != 0) {
-		as_rangeunlock(as);
-		ZFS_EXIT(zfsvfs);
-		return (error);
-	}
-
-	vn_a.vp = vp;
-	vn_a.offset = (u_offset_t)off;
-	vn_a.type = flags & MAP_TYPE;
-	vn_a.prot = prot;
-	vn_a.maxprot = maxprot;
-	vn_a.cred = cr;
-	vn_a.amp = NULL;
-	vn_a.flags = flags & ~MAP_TYPE;
-	vn_a.szc = 0;
-	vn_a.lgrp_mem_policy_flags = 0;
-
-	error = as_map(as, *addrp, len, segvn_create, &vn_a);
-
-	as_rangeunlock(as);
-	ZFS_EXIT(zfsvfs);
-	return (error);
-}
-
-/* ARGSUSED */
-static int
-zfs_addmap(vnode_t *vp, offset_t off, struct as *as, caddr_t addr,
-    size_t len, uchar_t prot, uchar_t maxprot, uint_t flags, cred_t *cr,
-    caller_context_t *ct)
-{
-	uint64_t pages = btopr(len);
-
-	atomic_add_64(&VTOZ(vp)->z_mapcnt, pages);
-	return (0);
-}
-
-/*
- * The reason we push dirty pages as part of zfs_delmap() is so that we get a
- * more accurate mtime for the associated file.  Since we don't have a way of
- * detecting when the data was actually modified, we have to resort to
- * heuristics.  If an explicit msync() is done, then we mark the mtime when the
- * last page is pushed.  The problem occurs when the msync() call is omitted,
- * which by far the most common case:
- *
- *	open()
- *	mmap()
- *	<modify memory>
- *	munmap()
- *	close()
- *	<time lapse>
- *	putpage() via fsflush
- *
- * If we wait until fsflush to come along, we can have a modification time that
- * is some arbitrary point in the future.  In order to prevent this in the
- * common case, we flush pages whenever a (MAP_SHARED, PROT_WRITE) mapping is
- * torn down.
- */
-/* ARGSUSED */
-static int
-zfs_delmap(vnode_t *vp, offset_t off, struct as *as, caddr_t addr,
-    size_t len, uint_t prot, uint_t maxprot, uint_t flags, cred_t *cr,
-    caller_context_t *ct)
-{
-	uint64_t pages = btopr(len);
-
-	ASSERT3U(VTOZ(vp)->z_mapcnt, >=, pages);
-	atomic_add_64(&VTOZ(vp)->z_mapcnt, -pages);
-
-	if ((flags & MAP_SHARED) && (prot & PROT_WRITE) &&
-	    vn_has_cached_data(vp))
-		(void) VOP_PUTPAGE(vp, off, len, B_ASYNC, cr, ct);
-
-	return (0);
-}
-
-/*
- * Free or allocate space in a file.  Currently, this function only
- * supports the `F_FREESP' command.  However, this command is somewhat
- * misnamed, as its functionality includes the ability to allocate as
- * well as free space.
- *
- *	IN:	vp	- vnode of file to free data in.
- *		cmd	- action to take (only F_FREESP supported).
- *		bfp	- section of file to free/alloc.
- *		flag	- current file open mode flags.
- *		offset	- current file offset.
- *		cr	- credentials of caller [UNUSED].
- *		ct	- caller context.
- *
- *	RETURN:	0 on success, error code on failure.
- *
- * Timestamps:
- *	vp - ctime|mtime updated
- */
-/* ARGSUSED */
-static int
-zfs_space(vnode_t *vp, int cmd, flock64_t *bfp, int flag,
-    offset_t offset, cred_t *cr, caller_context_t *ct)
-{
-	znode_t		*zp = VTOZ(vp);
-	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
-	uint64_t	off, len;
-	int		error;
-
-	ZFS_ENTER(zfsvfs);
-	ZFS_VERIFY_ZP(zp);
-
-	if (cmd != F_FREESP) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EINVAL));
-	}
-
-	/*
-	 * In a case vp->v_vfsp != zp->z_zfsvfs->z_vfs (e.g. snapshots) our
-	 * callers might not be able to detect properly that we are read-only,
-	 * so check it explicitly here.
-	 */
-	if (zfsvfs->z_vfs->vfs_flag & VFS_RDONLY) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EROFS));
-	}
-
-	if (error = convoff(vp, bfp, 0, offset)) {
-		ZFS_EXIT(zfsvfs);
-		return (error);
-	}
-
-	if (bfp->l_len < 0) {
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EINVAL));
-	}
-
-	off = bfp->l_start;
-	len = bfp->l_len; /* 0 means from off to end of file */
-
-	error = zfs_freesp(zp, off, len, flag, TRUE);
-
-	ZFS_EXIT(zfsvfs);
-	return (error);
-}
-#endif	/* illumos */
-
 CTASSERT(sizeof(struct zfid_short) <= sizeof(struct fid));
 CTASSERT(sizeof(struct zfid_long) <= sizeof(struct fid));
 
 /*ARGSUSED*/
 static int
 zfs_fid(vnode_t *vp, fid_t *fidp, caller_context_t *ct)
 {
 	znode_t		*zp = VTOZ(vp);
 	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
 	uint32_t	gen;
 	uint64_t	gen64;
 	uint64_t	object = zp->z_id;
 	zfid_short_t	*zfid;
 	int		size, i, error;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_GEN(zfsvfs),
 	    &gen64, sizeof (uint64_t))) != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	gen = (uint32_t)gen64;
 
 	size = (zfsvfs->z_parent != zfsvfs) ? LONG_FID_LEN : SHORT_FID_LEN;
 
 #ifdef illumos
 	if (fidp->fid_len < size) {
 		fidp->fid_len = size;
 		ZFS_EXIT(zfsvfs);
 		return (SET_ERROR(ENOSPC));
 	}
 #else
 	fidp->fid_len = size;
 #endif
 
 	zfid = (zfid_short_t *)fidp;
 
 	zfid->zf_len = size;
 
 	for (i = 0; i < sizeof (zfid->zf_object); i++)
 		zfid->zf_object[i] = (uint8_t)(object >> (8 * i));
 
 	/* Must have a non-zero generation number to distinguish from .zfs */
 	if (gen == 0)
 		gen = 1;
 	for (i = 0; i < sizeof (zfid->zf_gen); i++)
 		zfid->zf_gen[i] = (uint8_t)(gen >> (8 * i));
 
 	if (size == LONG_FID_LEN) {
 		uint64_t	objsetid = dmu_objset_id(zfsvfs->z_os);
 		zfid_long_t	*zlfid;
 
 		zlfid = (zfid_long_t *)fidp;
 
 		for (i = 0; i < sizeof (zlfid->zf_setid); i++)
 			zlfid->zf_setid[i] = (uint8_t)(objsetid >> (8 * i));
 
 		/* XXX - this should be the generation number for the objset */
 		for (i = 0; i < sizeof (zlfid->zf_setgen); i++)
 			zlfid->zf_setgen[i] = 0;
 	}
 
 	ZFS_EXIT(zfsvfs);
 	return (0);
 }
 
 static int
 zfs_pathconf(vnode_t *vp, int cmd, ulong_t *valp, cred_t *cr,
     caller_context_t *ct)
 {
 	znode_t		*zp, *xzp;
 	zfsvfs_t	*zfsvfs;
-	zfs_dirlock_t	*dl;
 	int		error;
 
 	switch (cmd) {
 	case _PC_LINK_MAX:
 		*valp = INT_MAX;
 		return (0);
 
 	case _PC_FILESIZEBITS:
 		*valp = 64;
 		return (0);
 #ifdef illumos
 	case _PC_XATTR_EXISTS:
 		zp = VTOZ(vp);
 		zfsvfs = zp->z_zfsvfs;
 		ZFS_ENTER(zfsvfs);
 		ZFS_VERIFY_ZP(zp);
 		*valp = 0;
-		error = zfs_dirent_lock(&dl, zp, "", &xzp,
-		    ZXATTR | ZEXISTS | ZSHARED, NULL, NULL);
+		error = zfs_dirent_lookup(zp, "", &xzp,
+		    ZXATTR | ZEXISTS | ZSHARED);
 		if (error == 0) {
-			zfs_dirent_unlock(dl);
 			if (!zfs_dirempty(xzp))
 				*valp = 1;
-			VN_RELE(ZTOV(xzp));
+			vrele(ZTOV(xzp));
 		} else if (error == ENOENT) {
 			/*
 			 * If there aren't extended attributes, it's the
 			 * same as having zero of them.
 			 */
 			error = 0;
 		}
 		ZFS_EXIT(zfsvfs);
 		return (error);
 
 	case _PC_SATTR_ENABLED:
 	case _PC_SATTR_EXISTS:
 		*valp = vfs_has_feature(vp->v_vfsp, VFSFT_SYSATTR_VIEWS) &&
 		    (vp->v_type == VREG || vp->v_type == VDIR);
 		return (0);
 
 	case _PC_ACCESS_FILTERING:
 		*valp = vfs_has_feature(vp->v_vfsp, VFSFT_ACCESS_FILTER) &&
 		    vp->v_type == VDIR;
 		return (0);
 
 	case _PC_ACL_ENABLED:
 		*valp = _ACL_ACE_ENABLED;
 		return (0);
 #endif	/* illumos */
 	case _PC_MIN_HOLE_SIZE:
 		*valp = (int)SPA_MINBLOCKSIZE;
 		return (0);
 #ifdef illumos
 	case _PC_TIMESTAMP_RESOLUTION:
 		/* nanosecond timestamp resolution */
 		*valp = 1L;
 		return (0);
 #endif
 	case _PC_ACL_EXTENDED:
 		*valp = 0;
 		return (0);
 
 	case _PC_ACL_NFS4:
 		*valp = 1;
 		return (0);
 
 	case _PC_ACL_PATH_MAX:
 		*valp = ACL_MAX_ENTRIES;
 		return (0);
 
 	default:
 		return (EOPNOTSUPP);
 	}
 }
 
 /*ARGSUSED*/
 static int
 zfs_getsecattr(vnode_t *vp, vsecattr_t *vsecp, int flag, cred_t *cr,
     caller_context_t *ct)
 {
 	znode_t *zp = VTOZ(vp);
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	int error;
 	boolean_t skipaclchk = (flag & ATTR_NOACLCHECK) ? B_TRUE : B_FALSE;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 	error = zfs_getacl(zp, vsecp, skipaclchk, cr);
 	ZFS_EXIT(zfsvfs);
 
 	return (error);
 }
 
 /*ARGSUSED*/
 int
 zfs_setsecattr(vnode_t *vp, vsecattr_t *vsecp, int flag, cred_t *cr,
     caller_context_t *ct)
 {
 	znode_t *zp = VTOZ(vp);
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	int error;
 	boolean_t skipaclchk = (flag & ATTR_NOACLCHECK) ? B_TRUE : B_FALSE;
 	zilog_t	*zilog = zfsvfs->z_log;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	error = zfs_setacl(zp, vsecp, skipaclchk, cr);
 
 	if (zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)
 		zil_commit(zilog, 0);
 
 	ZFS_EXIT(zfsvfs);
 	return (error);
 }
 
-#ifdef illumos
-/*
- * The smallest read we may consider to loan out an arcbuf.
- * This must be a power of 2.
- */
-int zcr_blksz_min = (1 << 10);	/* 1K */
-/*
- * If set to less than the file block size, allow loaning out of an
- * arcbuf for a partial block read.  This must be a power of 2.
- */
-int zcr_blksz_max = (1 << 17);	/* 128K */
-
-/*ARGSUSED*/
 static int
-zfs_reqzcbuf(vnode_t *vp, enum uio_rw ioflag, xuio_t *xuio, cred_t *cr,
-    caller_context_t *ct)
-{
-	znode_t	*zp = VTOZ(vp);
-	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
-	int max_blksz = zfsvfs->z_max_blksz;
-	uio_t *uio = &xuio->xu_uio;
-	ssize_t size = uio->uio_resid;
-	offset_t offset = uio->uio_loffset;
-	int blksz;
-	int fullblk, i;
-	arc_buf_t *abuf;
-	ssize_t maxsize;
-	int preamble, postamble;
-
-	if (xuio->xu_type != UIOTYPE_ZEROCOPY)
-		return (SET_ERROR(EINVAL));
-
-	ZFS_ENTER(zfsvfs);
-	ZFS_VERIFY_ZP(zp);
-	switch (ioflag) {
-	case UIO_WRITE:
-		/*
-		 * Loan out an arc_buf for write if write size is bigger than
-		 * max_blksz, and the file's block size is also max_blksz.
-		 */
-		blksz = max_blksz;
-		if (size < blksz || zp->z_blksz != blksz) {
-			ZFS_EXIT(zfsvfs);
-			return (SET_ERROR(EINVAL));
-		}
-		/*
-		 * Caller requests buffers for write before knowing where the
-		 * write offset might be (e.g. NFS TCP write).
-		 */
-		if (offset == -1) {
-			preamble = 0;
-		} else {
-			preamble = P2PHASE(offset, blksz);
-			if (preamble) {
-				preamble = blksz - preamble;
-				size -= preamble;
-			}
-		}
-
-		postamble = P2PHASE(size, blksz);
-		size -= postamble;
-
-		fullblk = size / blksz;
-		(void) dmu_xuio_init(xuio,
-		    (preamble != 0) + fullblk + (postamble != 0));
-		DTRACE_PROBE3(zfs_reqzcbuf_align, int, preamble,
-		    int, postamble, int,
-		    (preamble != 0) + fullblk + (postamble != 0));
-
-		/*
-		 * Have to fix iov base/len for partial buffers.  They
-		 * currently represent full arc_buf's.
-		 */
-		if (preamble) {
-			/* data begins in the middle of the arc_buf */
-			abuf = dmu_request_arcbuf(sa_get_db(zp->z_sa_hdl),
-			    blksz);
-			ASSERT(abuf);
-			(void) dmu_xuio_add(xuio, abuf,
-			    blksz - preamble, preamble);
-		}
-
-		for (i = 0; i < fullblk; i++) {
-			abuf = dmu_request_arcbuf(sa_get_db(zp->z_sa_hdl),
-			    blksz);
-			ASSERT(abuf);
-			(void) dmu_xuio_add(xuio, abuf, 0, blksz);
-		}
-
-		if (postamble) {
-			/* data ends in the middle of the arc_buf */
-			abuf = dmu_request_arcbuf(sa_get_db(zp->z_sa_hdl),
-			    blksz);
-			ASSERT(abuf);
-			(void) dmu_xuio_add(xuio, abuf, 0, postamble);
-		}
-		break;
-	case UIO_READ:
-		/*
-		 * Loan out an arc_buf for read if the read size is larger than
-		 * the current file block size.  Block alignment is not
-		 * considered.  Partial arc_buf will be loaned out for read.
-		 */
-		blksz = zp->z_blksz;
-		if (blksz < zcr_blksz_min)
-			blksz = zcr_blksz_min;
-		if (blksz > zcr_blksz_max)
-			blksz = zcr_blksz_max;
-		/* avoid potential complexity of dealing with it */
-		if (blksz > max_blksz) {
-			ZFS_EXIT(zfsvfs);
-			return (SET_ERROR(EINVAL));
-		}
-
-		maxsize = zp->z_size - uio->uio_loffset;
-		if (size > maxsize)
-			size = maxsize;
-
-		if (size < blksz || vn_has_cached_data(vp)) {
-			ZFS_EXIT(zfsvfs);
-			return (SET_ERROR(EINVAL));
-		}
-		break;
-	default:
-		ZFS_EXIT(zfsvfs);
-		return (SET_ERROR(EINVAL));
-	}
-
-	uio->uio_extflg = UIO_XUIO;
-	XUIO_XUZC_RW(xuio) = ioflag;
-	ZFS_EXIT(zfsvfs);
-	return (0);
-}
-
-/*ARGSUSED*/
-static int
-zfs_retzcbuf(vnode_t *vp, xuio_t *xuio, cred_t *cr, caller_context_t *ct)
-{
-	int i;
-	arc_buf_t *abuf;
-	int ioflag = XUIO_XUZC_RW(xuio);
-
-	ASSERT(xuio->xu_type == UIOTYPE_ZEROCOPY);
-
-	i = dmu_xuio_cnt(xuio);
-	while (i-- > 0) {
-		abuf = dmu_xuio_arcbuf(xuio, i);
-		/*
-		 * if abuf == NULL, it must be a write buffer
-		 * that has been returned in zfs_write().
-		 */
-		if (abuf)
-			dmu_return_arcbuf(abuf);
-		ASSERT(abuf || ioflag == UIO_WRITE);
-	}
-
-	dmu_xuio_fini(xuio);
-	return (0);
-}
-
-/*
- * Predeclare these here so that the compiler assumes that
- * this is an "old style" function declaration that does
- * not include arguments => we won't get type mismatch errors
- * in the initializations that follow.
- */
-static int zfs_inval();
-static int zfs_isdir();
-
-static int
-zfs_inval()
-{
-	return (SET_ERROR(EINVAL));
-}
-
-static int
-zfs_isdir()
-{
-	return (SET_ERROR(EISDIR));
-}
-/*
- * Directory vnode operations template
- */
-vnodeops_t *zfs_dvnodeops;
-const fs_operation_def_t zfs_dvnodeops_template[] = {
-	VOPNAME_OPEN,		{ .vop_open = zfs_open },
-	VOPNAME_CLOSE,		{ .vop_close = zfs_close },
-	VOPNAME_READ,		{ .error = zfs_isdir },
-	VOPNAME_WRITE,		{ .error = zfs_isdir },
-	VOPNAME_IOCTL,		{ .vop_ioctl = zfs_ioctl },
-	VOPNAME_GETATTR,	{ .vop_getattr = zfs_getattr },
-	VOPNAME_SETATTR,	{ .vop_setattr = zfs_setattr },
-	VOPNAME_ACCESS,		{ .vop_access = zfs_access },
-	VOPNAME_LOOKUP,		{ .vop_lookup = zfs_lookup },
-	VOPNAME_CREATE,		{ .vop_create = zfs_create },
-	VOPNAME_REMOVE,		{ .vop_remove = zfs_remove },
-	VOPNAME_LINK,		{ .vop_link = zfs_link },
-	VOPNAME_RENAME,		{ .vop_rename = zfs_rename },
-	VOPNAME_MKDIR,		{ .vop_mkdir = zfs_mkdir },
-	VOPNAME_RMDIR,		{ .vop_rmdir = zfs_rmdir },
-	VOPNAME_READDIR,	{ .vop_readdir = zfs_readdir },
-	VOPNAME_SYMLINK,	{ .vop_symlink = zfs_symlink },
-	VOPNAME_FSYNC,		{ .vop_fsync = zfs_fsync },
-	VOPNAME_INACTIVE,	{ .vop_inactive = zfs_inactive },
-	VOPNAME_FID,		{ .vop_fid = zfs_fid },
-	VOPNAME_SEEK,		{ .vop_seek = zfs_seek },
-	VOPNAME_PATHCONF,	{ .vop_pathconf = zfs_pathconf },
-	VOPNAME_GETSECATTR,	{ .vop_getsecattr = zfs_getsecattr },
-	VOPNAME_SETSECATTR,	{ .vop_setsecattr = zfs_setsecattr },
-	VOPNAME_VNEVENT,	{ .vop_vnevent = fs_vnevent_support },
-	NULL,			NULL
-};
-
-/*
- * Regular file vnode operations template
- */
-vnodeops_t *zfs_fvnodeops;
-const fs_operation_def_t zfs_fvnodeops_template[] = {
-	VOPNAME_OPEN,		{ .vop_open = zfs_open },
-	VOPNAME_CLOSE,		{ .vop_close = zfs_close },
-	VOPNAME_READ,		{ .vop_read = zfs_read },
-	VOPNAME_WRITE,		{ .vop_write = zfs_write },
-	VOPNAME_IOCTL,		{ .vop_ioctl = zfs_ioctl },
-	VOPNAME_GETATTR,	{ .vop_getattr = zfs_getattr },
-	VOPNAME_SETATTR,	{ .vop_setattr = zfs_setattr },
-	VOPNAME_ACCESS,		{ .vop_access = zfs_access },
-	VOPNAME_LOOKUP,		{ .vop_lookup = zfs_lookup },
-	VOPNAME_RENAME,		{ .vop_rename = zfs_rename },
-	VOPNAME_FSYNC,		{ .vop_fsync = zfs_fsync },
-	VOPNAME_INACTIVE,	{ .vop_inactive = zfs_inactive },
-	VOPNAME_FID,		{ .vop_fid = zfs_fid },
-	VOPNAME_SEEK,		{ .vop_seek = zfs_seek },
-	VOPNAME_FRLOCK,		{ .vop_frlock = zfs_frlock },
-	VOPNAME_SPACE,		{ .vop_space = zfs_space },
-	VOPNAME_GETPAGE,	{ .vop_getpage = zfs_getpage },
-	VOPNAME_PUTPAGE,	{ .vop_putpage = zfs_putpage },
-	VOPNAME_MAP,		{ .vop_map = zfs_map },
-	VOPNAME_ADDMAP,		{ .vop_addmap = zfs_addmap },
-	VOPNAME_DELMAP,		{ .vop_delmap = zfs_delmap },
-	VOPNAME_PATHCONF,	{ .vop_pathconf = zfs_pathconf },
-	VOPNAME_GETSECATTR,	{ .vop_getsecattr = zfs_getsecattr },
-	VOPNAME_SETSECATTR,	{ .vop_setsecattr = zfs_setsecattr },
-	VOPNAME_VNEVENT,	{ .vop_vnevent = fs_vnevent_support },
-	VOPNAME_REQZCBUF,	{ .vop_reqzcbuf = zfs_reqzcbuf },
-	VOPNAME_RETZCBUF,	{ .vop_retzcbuf = zfs_retzcbuf },
-	NULL,			NULL
-};
-
-/*
- * Symbolic link vnode operations template
- */
-vnodeops_t *zfs_symvnodeops;
-const fs_operation_def_t zfs_symvnodeops_template[] = {
-	VOPNAME_GETATTR,	{ .vop_getattr = zfs_getattr },
-	VOPNAME_SETATTR,	{ .vop_setattr = zfs_setattr },
-	VOPNAME_ACCESS,		{ .vop_access = zfs_access },
-	VOPNAME_RENAME,		{ .vop_rename = zfs_rename },
-	VOPNAME_READLINK,	{ .vop_readlink = zfs_readlink },
-	VOPNAME_INACTIVE,	{ .vop_inactive = zfs_inactive },
-	VOPNAME_FID,		{ .vop_fid = zfs_fid },
-	VOPNAME_PATHCONF,	{ .vop_pathconf = zfs_pathconf },
-	VOPNAME_VNEVENT,	{ .vop_vnevent = fs_vnevent_support },
-	NULL,			NULL
-};
-
-/*
- * special share hidden files vnode operations template
- */
-vnodeops_t *zfs_sharevnodeops;
-const fs_operation_def_t zfs_sharevnodeops_template[] = {
-	VOPNAME_GETATTR,	{ .vop_getattr = zfs_getattr },
-	VOPNAME_ACCESS,		{ .vop_access = zfs_access },
-	VOPNAME_INACTIVE,	{ .vop_inactive = zfs_inactive },
-	VOPNAME_FID,		{ .vop_fid = zfs_fid },
-	VOPNAME_PATHCONF,	{ .vop_pathconf = zfs_pathconf },
-	VOPNAME_GETSECATTR,	{ .vop_getsecattr = zfs_getsecattr },
-	VOPNAME_SETSECATTR,	{ .vop_setsecattr = zfs_setsecattr },
-	VOPNAME_VNEVENT,	{ .vop_vnevent = fs_vnevent_support },
-	NULL,			NULL
-};
-
-/*
- * Extended attribute directory vnode operations template
- *
- * This template is identical to the directory vnodes
- * operation template except for restricted operations:
- *	VOP_MKDIR()
- *	VOP_SYMLINK()
- *
- * Note that there are other restrictions embedded in:
- *	zfs_create()	- restrict type to VREG
- *	zfs_link()	- no links into/out of attribute space
- *	zfs_rename()	- no moves into/out of attribute space
- */
-vnodeops_t *zfs_xdvnodeops;
-const fs_operation_def_t zfs_xdvnodeops_template[] = {
-	VOPNAME_OPEN,		{ .vop_open = zfs_open },
-	VOPNAME_CLOSE,		{ .vop_close = zfs_close },
-	VOPNAME_IOCTL,		{ .vop_ioctl = zfs_ioctl },
-	VOPNAME_GETATTR,	{ .vop_getattr = zfs_getattr },
-	VOPNAME_SETATTR,	{ .vop_setattr = zfs_setattr },
-	VOPNAME_ACCESS,		{ .vop_access = zfs_access },
-	VOPNAME_LOOKUP,		{ .vop_lookup = zfs_lookup },
-	VOPNAME_CREATE,		{ .vop_create = zfs_create },
-	VOPNAME_REMOVE,		{ .vop_remove = zfs_remove },
-	VOPNAME_LINK,		{ .vop_link = zfs_link },
-	VOPNAME_RENAME,		{ .vop_rename = zfs_rename },
-	VOPNAME_MKDIR,		{ .error = zfs_inval },
-	VOPNAME_RMDIR,		{ .vop_rmdir = zfs_rmdir },
-	VOPNAME_READDIR,	{ .vop_readdir = zfs_readdir },
-	VOPNAME_SYMLINK,	{ .error = zfs_inval },
-	VOPNAME_FSYNC,		{ .vop_fsync = zfs_fsync },
-	VOPNAME_INACTIVE,	{ .vop_inactive = zfs_inactive },
-	VOPNAME_FID,		{ .vop_fid = zfs_fid },
-	VOPNAME_SEEK,		{ .vop_seek = zfs_seek },
-	VOPNAME_PATHCONF,	{ .vop_pathconf = zfs_pathconf },
-	VOPNAME_GETSECATTR,	{ .vop_getsecattr = zfs_getsecattr },
-	VOPNAME_SETSECATTR,	{ .vop_setsecattr = zfs_setsecattr },
-	VOPNAME_VNEVENT,	{ .vop_vnevent = fs_vnevent_support },
-	NULL,			NULL
-};
-
-/*
- * Error vnode operations template
- */
-vnodeops_t *zfs_evnodeops;
-const fs_operation_def_t zfs_evnodeops_template[] = {
-	VOPNAME_INACTIVE,	{ .vop_inactive = zfs_inactive },
-	VOPNAME_PATHCONF,	{ .vop_pathconf = zfs_pathconf },
-	NULL,			NULL
-};
-#endif	/* illumos */
-
-static int
 ioflags(int ioflags)
 {
 	int flags = 0;
 
 	if (ioflags & IO_APPEND)
 		flags |= FAPPEND;
 	if (ioflags & IO_NDELAY)
-        	flags |= FNONBLOCK;
+		flags |= FNONBLOCK;
 	if (ioflags & IO_SYNC)
 		flags |= (FSYNC | FDSYNC | FRSYNC);
 
 	return (flags);
 }
 
 static int
 zfs_getpages(struct vnode *vp, vm_page_t *m, int count, int *rbehind,
     int *rahead)
 {
 	znode_t *zp = VTOZ(vp);
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	objset_t *os = zp->z_zfsvfs->z_os;
 	vm_page_t mlast;
 	vm_object_t object;
 	caddr_t va;
 	struct sf_buf *sf;
 	off_t startoff, endoff;
 	int i, error;
 	vm_pindex_t reqstart, reqend;
 	int lsize, size;
 
 	object = m[0]->object;
 	error = 0;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	zfs_vmobject_wlock(object);
 	if (m[count - 1]->valid != 0 && --count == 0) {
 		zfs_vmobject_wunlock(object);
 		goto out;
 	}
 
 	mlast = m[count - 1];
 
 	if (IDX_TO_OFF(mlast->pindex) >=
 	    object->un_pager.vnp.vnp_size) {
 		zfs_vmobject_wunlock(object);
 		ZFS_EXIT(zfsvfs);
 		return (zfs_vm_pagerret_bad);
 	}
 
 	PCPU_INC(cnt.v_vnodein);
 	PCPU_ADD(cnt.v_vnodepgsin, count);
 
 	lsize = PAGE_SIZE;
 	if (IDX_TO_OFF(mlast->pindex) + lsize > object->un_pager.vnp.vnp_size)
 		lsize = object->un_pager.vnp.vnp_size -
 		    IDX_TO_OFF(mlast->pindex);
 	zfs_vmobject_wunlock(object);
 
 	for (i = 0; i < count; i++) {
 		size = PAGE_SIZE;
 		if (i == count - 1)
 			size = lsize;
 		va = zfs_map_page(m[i], &sf);
 		error = dmu_read(os, zp->z_id, IDX_TO_OFF(m[i]->pindex),
 		    size, va, DMU_READ_PREFETCH);
 		if (size != PAGE_SIZE)
 			bzero(va + size, PAGE_SIZE - size);
 		zfs_unmap_page(sf);
 		if (error != 0)
 			goto out;
 	}
 
 	zfs_vmobject_wlock(object);
 	for (i = 0; i < count; i++)
 		m[i]->valid = VM_PAGE_BITS_ALL;
 	zfs_vmobject_wunlock(object);
 
 out:
 	ZFS_ACCESSTIME_STAMP(zfsvfs, zp);
 	ZFS_EXIT(zfsvfs);
 	if (error == 0) {
 		if (rbehind)
 			*rbehind = 0;
 		if (rahead)
 			*rahead = 0;
 		return (zfs_vm_pagerret_ok);
 	} else
 		return (zfs_vm_pagerret_error);
 }
 
 static int
 zfs_freebsd_getpages(ap)
 	struct vop_getpages_args /* {
 		struct vnode *a_vp;
 		vm_page_t *a_m;
 		int a_count;
 		int *a_rbehind;
 		int *a_rahead;
 	} */ *ap;
 {
 
 	return (zfs_getpages(ap->a_vp, ap->a_m, ap->a_count, ap->a_rbehind,
 	    ap->a_rahead));
 }
 
 static int
 zfs_putpages(struct vnode *vp, vm_page_t *ma, size_t len, int flags,
     int *rtvals)
 {
 	znode_t		*zp = VTOZ(vp);
 	zfsvfs_t	*zfsvfs = zp->z_zfsvfs;
 	rl_t		*rl;
 	dmu_tx_t	*tx;
 	struct sf_buf	*sf;
 	vm_object_t	object;
 	vm_page_t	m;
 	caddr_t		va;
 	size_t		tocopy;
 	size_t		lo_len;
 	vm_ooffset_t	lo_off;
 	vm_ooffset_t	off;
 	uint_t		blksz;
 	int		ncount;
 	int		pcount;
 	int		err;
 	int		i;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	object = vp->v_object;
 	pcount = btoc(len);
 	ncount = pcount;
 
 	KASSERT(ma[0]->object == object, ("mismatching object"));
 	KASSERT(len > 0 && (len & PAGE_MASK) == 0, ("unexpected length"));
 
 	for (i = 0; i < pcount; i++)
 		rtvals[i] = zfs_vm_pagerret_error;
 
 	off = IDX_TO_OFF(ma[0]->pindex);
 	blksz = zp->z_blksz;
 	lo_off = rounddown(off, blksz);
 	lo_len = roundup(len + (off - lo_off), blksz);
 	rl = zfs_range_lock(zp, lo_off, lo_len, RL_WRITER);
 
 	zfs_vmobject_wlock(object);
 	if (len + off > object->un_pager.vnp.vnp_size) {
 		if (object->un_pager.vnp.vnp_size > off) {
 			int pgoff;
 
 			len = object->un_pager.vnp.vnp_size - off;
 			ncount = btoc(len);
 			if ((pgoff = (int)len & PAGE_MASK) != 0) {
 				/*
 				 * If the object is locked and the following
 				 * conditions hold, then the page's dirty
 				 * field cannot be concurrently changed by a
 				 * pmap operation.
 				 */
 				m = ma[ncount - 1];
 				vm_page_assert_sbusied(m);
 				KASSERT(!pmap_page_is_write_mapped(m),
 				    ("zfs_putpages: page %p is not read-only", m));
 				vm_page_clear_dirty(m, pgoff, PAGE_SIZE -
 				    pgoff);
 			}
 		} else {
 			len = 0;
 			ncount = 0;
 		}
 		if (ncount < pcount) {
 			for (i = ncount; i < pcount; i++) {
 				rtvals[i] = zfs_vm_pagerret_bad;
 			}
 		}
 	}
 	zfs_vmobject_wunlock(object);
 
 	if (ncount == 0)
 		goto out;
 
 	if (zfs_owner_overquota(zfsvfs, zp, B_FALSE) ||
 	    zfs_owner_overquota(zfsvfs, zp, B_TRUE)) {
 		goto out;
 	}
 
 top:
 	tx = dmu_tx_create(zfsvfs->z_os);
 	dmu_tx_hold_write(tx, zp->z_id, off, len);
 
 	dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
 	zfs_sa_upgrade_txholds(tx, zp);
 	err = dmu_tx_assign(tx, TXG_NOWAIT);
 	if (err != 0) {
 		if (err == ERESTART) {
 			dmu_tx_wait(tx);
 			dmu_tx_abort(tx);
 			goto top;
 		}
 		dmu_tx_abort(tx);
 		goto out;
 	}
 
 	if (zp->z_blksz < PAGE_SIZE) {
 		i = 0;
 		for (i = 0; len > 0; off += tocopy, len -= tocopy, i++) {
 			tocopy = len > PAGE_SIZE ? PAGE_SIZE : len;
 			va = zfs_map_page(ma[i], &sf);
 			dmu_write(zfsvfs->z_os, zp->z_id, off, tocopy, va, tx);
 			zfs_unmap_page(sf);
 		}
 	} else {
 		err = dmu_write_pages(zfsvfs->z_os, zp->z_id, off, len, ma, tx);
 	}
 
 	if (err == 0) {
 		uint64_t mtime[2], ctime[2];
 		sa_bulk_attr_t bulk[3];
 		int count = 0;
 
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL,
 		    &mtime, 16);
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL,
 		    &ctime, 16);
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL,
 		    &zp->z_pflags, 8);
 		zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime,
 		    B_TRUE);
 		(void)sa_bulk_update(zp->z_sa_hdl, bulk, count, tx);
 		zfs_log_write(zfsvfs->z_log, tx, TX_WRITE, zp, off, len, 0);
 
 		zfs_vmobject_wlock(object);
 		for (i = 0; i < ncount; i++) {
 			rtvals[i] = zfs_vm_pagerret_ok;
 			vm_page_undirty(ma[i]);
 		}
 		zfs_vmobject_wunlock(object);
 		PCPU_INC(cnt.v_vnodeout);
 		PCPU_ADD(cnt.v_vnodepgsout, ncount);
 	}
 	dmu_tx_commit(tx);
 
 out:
 	zfs_range_unlock(rl);
 	if ((flags & (zfs_vm_pagerput_sync | zfs_vm_pagerput_inval)) != 0 ||
 	    zfsvfs->z_os->os_sync == ZFS_SYNC_ALWAYS)
 		zil_commit(zfsvfs->z_log, zp->z_id);
 	ZFS_EXIT(zfsvfs);
 	return (rtvals[0]);
 }
 
 int
 zfs_freebsd_putpages(ap)
 	struct vop_putpages_args /* {
 		struct vnode *a_vp;
 		vm_page_t *a_m;
 		int a_count;
 		int a_sync;
 		int *a_rtvals;
 	} */ *ap;
 {
 
 	return (zfs_putpages(ap->a_vp, ap->a_m, ap->a_count, ap->a_sync,
 	    ap->a_rtvals));
 }
 
 static int
 zfs_freebsd_bmap(ap)
 	struct vop_bmap_args /* {
 		struct vnode *a_vp;
 		daddr_t  a_bn;
 		struct bufobj **a_bop;
 		daddr_t *a_bnp;
 		int *a_runp;
 		int *a_runb;
 	} */ *ap;
 {
 
 	if (ap->a_bop != NULL)
 		*ap->a_bop = &ap->a_vp->v_bufobj;
 	if (ap->a_bnp != NULL)
 		*ap->a_bnp = ap->a_bn;
 	if (ap->a_runp != NULL)
 		*ap->a_runp = 0;
 	if (ap->a_runb != NULL)
 		*ap->a_runb = 0;
 
 	return (0);
 }
 
 static int
 zfs_freebsd_open(ap)
 	struct vop_open_args /* {
 		struct vnode *a_vp;
 		int a_mode;
 		struct ucred *a_cred;
 		struct thread *a_td;
 	} */ *ap;
 {
 	vnode_t	*vp = ap->a_vp;
 	znode_t *zp = VTOZ(vp);
 	int error;
 
 	error = zfs_open(&vp, ap->a_mode, ap->a_cred, NULL);
 	if (error == 0)
 		vnode_create_vobject(vp, zp->z_size, ap->a_td);
 	return (error);
 }
 
 static int
 zfs_freebsd_close(ap)
 	struct vop_close_args /* {
 		struct vnode *a_vp;
 		int  a_fflag;
 		struct ucred *a_cred;
 		struct thread *a_td;
 	} */ *ap;
 {
 
 	return (zfs_close(ap->a_vp, ap->a_fflag, 1, 0, ap->a_cred, NULL));
 }
 
 static int
 zfs_freebsd_ioctl(ap)
 	struct vop_ioctl_args /* {
 		struct vnode *a_vp;
 		u_long a_command;
 		caddr_t a_data;
 		int a_fflag;
 		struct ucred *cred;
 		struct thread *td;
 	} */ *ap;
 {
 
 	return (zfs_ioctl(ap->a_vp, ap->a_command, (intptr_t)ap->a_data,
 	    ap->a_fflag, ap->a_cred, NULL, NULL));
 }
 
 static int
 zfs_freebsd_read(ap)
 	struct vop_read_args /* {
 		struct vnode *a_vp;
 		struct uio *a_uio;
 		int a_ioflag;
 		struct ucred *a_cred;
 	} */ *ap;
 {
 
 	return (zfs_read(ap->a_vp, ap->a_uio, ioflags(ap->a_ioflag),
 	    ap->a_cred, NULL));
 }
 
 static int
 zfs_freebsd_write(ap)
 	struct vop_write_args /* {
 		struct vnode *a_vp;
 		struct uio *a_uio;
 		int a_ioflag;
 		struct ucred *a_cred;
 	} */ *ap;
 {
 
 	return (zfs_write(ap->a_vp, ap->a_uio, ioflags(ap->a_ioflag),
 	    ap->a_cred, NULL));
 }
 
 static int
 zfs_freebsd_access(ap)
 	struct vop_access_args /* {
 		struct vnode *a_vp;
 		accmode_t a_accmode;
 		struct ucred *a_cred;
 		struct thread *a_td;
 	} */ *ap;
 {
 	vnode_t *vp = ap->a_vp;
 	znode_t *zp = VTOZ(vp);
 	accmode_t accmode;
 	int error = 0;
 
 	/*
 	 * ZFS itself only knowns about VREAD, VWRITE, VEXEC and VAPPEND,
 	 */
 	accmode = ap->a_accmode & (VREAD|VWRITE|VEXEC|VAPPEND);
 	if (accmode != 0)
 		error = zfs_access(ap->a_vp, accmode, 0, ap->a_cred, NULL);
 
 	/*
 	 * VADMIN has to be handled by vaccess().
 	 */
 	if (error == 0) {
 		accmode = ap->a_accmode & ~(VREAD|VWRITE|VEXEC|VAPPEND);
 		if (accmode != 0) {
 			error = vaccess(vp->v_type, zp->z_mode, zp->z_uid,
 			    zp->z_gid, accmode, ap->a_cred, NULL);
 		}
 	}
 
 	/*
 	 * For VEXEC, ensure that at least one execute bit is set for
 	 * non-directories.
 	 */
 	if (error == 0 && (ap->a_accmode & VEXEC) != 0 && vp->v_type != VDIR &&
 	    (zp->z_mode & (S_IXUSR | S_IXGRP | S_IXOTH)) == 0) {
 		error = EACCES;
 	}
 
 	return (error);
 }
 
 static int
 zfs_freebsd_lookup(ap)
 	struct vop_lookup_args /* {
 		struct vnode *a_dvp;
 		struct vnode **a_vpp;
 		struct componentname *a_cnp;
 	} */ *ap;
 {
 	struct componentname *cnp = ap->a_cnp;
 	char nm[NAME_MAX + 1];
 
 	ASSERT(cnp->cn_namelen < sizeof(nm));
 	strlcpy(nm, cnp->cn_nameptr, MIN(cnp->cn_namelen + 1, sizeof(nm)));
 
 	return (zfs_lookup(ap->a_dvp, nm, ap->a_vpp, cnp, cnp->cn_nameiop,
 	    cnp->cn_cred, cnp->cn_thread, 0));
 }
 
 static int
+zfs_cache_lookup(ap)
+	struct vop_lookup_args /* {
+		struct vnode *a_dvp;
+		struct vnode **a_vpp;
+		struct componentname *a_cnp;
+	} */ *ap;
+{
+	zfsvfs_t *zfsvfs;
+
+	zfsvfs = ap->a_dvp->v_mount->mnt_data;
+	if (zfsvfs->z_use_namecache)
+		return (vfs_cache_lookup(ap));
+	else
+		return (zfs_freebsd_lookup(ap));
+}
+
+static int
 zfs_freebsd_create(ap)
 	struct vop_create_args /* {
 		struct vnode *a_dvp;
 		struct vnode **a_vpp;
 		struct componentname *a_cnp;
 		struct vattr *a_vap;
 	} */ *ap;
 {
+	zfsvfs_t *zfsvfs;
 	struct componentname *cnp = ap->a_cnp;
 	vattr_t *vap = ap->a_vap;
 	int error, mode;
 
 	ASSERT(cnp->cn_flags & SAVENAME);
 
 	vattr_init_mask(vap);
 	mode = vap->va_mode & ALLPERMS;
+	zfsvfs = ap->a_dvp->v_mount->mnt_data;
 
 	error = zfs_create(ap->a_dvp, cnp->cn_nameptr, vap, !EXCL, mode,
 	    ap->a_vpp, cnp->cn_cred, cnp->cn_thread);
-#ifdef FREEBSD_NAMECACHE
-	if (error == 0 && (cnp->cn_flags & MAKEENTRY) != 0)
+	if (zfsvfs->z_use_namecache &&
+	    error == 0 && (cnp->cn_flags & MAKEENTRY) != 0)
 		cache_enter(ap->a_dvp, *ap->a_vpp, cnp);
-#endif
 	return (error);
 }
 
 static int
 zfs_freebsd_remove(ap)
 	struct vop_remove_args /* {
 		struct vnode *a_dvp;
 		struct vnode *a_vp;
 		struct componentname *a_cnp;
 	} */ *ap;
 {
 
 	ASSERT(ap->a_cnp->cn_flags & SAVENAME);
 
-	return (zfs_remove(ap->a_dvp, ap->a_cnp->cn_nameptr,
-	    ap->a_cnp->cn_cred, NULL, 0));
+	return (zfs_remove(ap->a_dvp, ap->a_vp, ap->a_cnp->cn_nameptr,
+	    ap->a_cnp->cn_cred));
 }
 
 static int
 zfs_freebsd_mkdir(ap)
 	struct vop_mkdir_args /* {
 		struct vnode *a_dvp;
 		struct vnode **a_vpp;
 		struct componentname *a_cnp;
 		struct vattr *a_vap;
 	} */ *ap;
 {
 	vattr_t *vap = ap->a_vap;
 
 	ASSERT(ap->a_cnp->cn_flags & SAVENAME);
 
 	vattr_init_mask(vap);
 
 	return (zfs_mkdir(ap->a_dvp, ap->a_cnp->cn_nameptr, vap, ap->a_vpp,
-	    ap->a_cnp->cn_cred, NULL, 0, NULL));
+	    ap->a_cnp->cn_cred));
 }
 
 static int
 zfs_freebsd_rmdir(ap)
 	struct vop_rmdir_args /* {
 		struct vnode *a_dvp;
 		struct vnode *a_vp;
 		struct componentname *a_cnp;
 	} */ *ap;
 {
 	struct componentname *cnp = ap->a_cnp;
 
 	ASSERT(cnp->cn_flags & SAVENAME);
 
-	return (zfs_rmdir(ap->a_dvp, cnp->cn_nameptr, NULL, cnp->cn_cred, NULL, 0));
+	return (zfs_rmdir(ap->a_dvp, ap->a_vp, cnp->cn_nameptr, cnp->cn_cred));
 }
 
 static int
 zfs_freebsd_readdir(ap)
 	struct vop_readdir_args /* {
 		struct vnode *a_vp;
 		struct uio *a_uio;
 		struct ucred *a_cred;
 		int *a_eofflag;
 		int *a_ncookies;
 		u_long **a_cookies;
 	} */ *ap;
 {
 
 	return (zfs_readdir(ap->a_vp, ap->a_uio, ap->a_cred, ap->a_eofflag,
 	    ap->a_ncookies, ap->a_cookies));
 }
 
 static int
 zfs_freebsd_fsync(ap)
 	struct vop_fsync_args /* {
 		struct vnode *a_vp;
 		int a_waitfor;
 		struct thread *a_td;
 	} */ *ap;
 {
 
 	vop_stdfsync(ap);
 	return (zfs_fsync(ap->a_vp, 0, ap->a_td->td_ucred, NULL));
 }
 
 static int
 zfs_freebsd_getattr(ap)
 	struct vop_getattr_args /* {
 		struct vnode *a_vp;
 		struct vattr *a_vap;
 		struct ucred *a_cred;
 	} */ *ap;
 {
 	vattr_t *vap = ap->a_vap;
 	xvattr_t xvap;
 	u_long fflags = 0;
 	int error;
 
 	xva_init(&xvap);
 	xvap.xva_vattr = *vap;
 	xvap.xva_vattr.va_mask |= AT_XVATTR;
 
 	/* Convert chflags into ZFS-type flags. */
 	/* XXX: what about SF_SETTABLE?. */
 	XVA_SET_REQ(&xvap, XAT_IMMUTABLE);
 	XVA_SET_REQ(&xvap, XAT_APPENDONLY);
 	XVA_SET_REQ(&xvap, XAT_NOUNLINK);
 	XVA_SET_REQ(&xvap, XAT_NODUMP);
 	XVA_SET_REQ(&xvap, XAT_READONLY);
 	XVA_SET_REQ(&xvap, XAT_ARCHIVE);
 	XVA_SET_REQ(&xvap, XAT_SYSTEM);
 	XVA_SET_REQ(&xvap, XAT_HIDDEN);
 	XVA_SET_REQ(&xvap, XAT_REPARSE);
 	XVA_SET_REQ(&xvap, XAT_OFFLINE);
 	XVA_SET_REQ(&xvap, XAT_SPARSE);
 
 	error = zfs_getattr(ap->a_vp, (vattr_t *)&xvap, 0, ap->a_cred, NULL);
 	if (error != 0)
 		return (error);
 
 	/* Convert ZFS xattr into chflags. */
 #define	FLAG_CHECK(fflag, xflag, xfield)	do {			\
 	if (XVA_ISSET_RTN(&xvap, (xflag)) && (xfield) != 0)		\
 		fflags |= (fflag);					\
 } while (0)
 	FLAG_CHECK(SF_IMMUTABLE, XAT_IMMUTABLE,
 	    xvap.xva_xoptattrs.xoa_immutable);
 	FLAG_CHECK(SF_APPEND, XAT_APPENDONLY,
 	    xvap.xva_xoptattrs.xoa_appendonly);
 	FLAG_CHECK(SF_NOUNLINK, XAT_NOUNLINK,
 	    xvap.xva_xoptattrs.xoa_nounlink);
 	FLAG_CHECK(UF_ARCHIVE, XAT_ARCHIVE,
 	    xvap.xva_xoptattrs.xoa_archive);
 	FLAG_CHECK(UF_NODUMP, XAT_NODUMP,
 	    xvap.xva_xoptattrs.xoa_nodump);
 	FLAG_CHECK(UF_READONLY, XAT_READONLY,
 	    xvap.xva_xoptattrs.xoa_readonly);
 	FLAG_CHECK(UF_SYSTEM, XAT_SYSTEM,
 	    xvap.xva_xoptattrs.xoa_system);
 	FLAG_CHECK(UF_HIDDEN, XAT_HIDDEN,
 	    xvap.xva_xoptattrs.xoa_hidden);
 	FLAG_CHECK(UF_REPARSE, XAT_REPARSE,
 	    xvap.xva_xoptattrs.xoa_reparse);
 	FLAG_CHECK(UF_OFFLINE, XAT_OFFLINE,
 	    xvap.xva_xoptattrs.xoa_offline);
 	FLAG_CHECK(UF_SPARSE, XAT_SPARSE,
 	    xvap.xva_xoptattrs.xoa_sparse);
 
 #undef	FLAG_CHECK
 	*vap = xvap.xva_vattr;
 	vap->va_flags = fflags;
 	return (0);
 }
 
 static int
 zfs_freebsd_setattr(ap)
 	struct vop_setattr_args /* {
 		struct vnode *a_vp;
 		struct vattr *a_vap;
 		struct ucred *a_cred;
 	} */ *ap;
 {
 	vnode_t *vp = ap->a_vp;
 	vattr_t *vap = ap->a_vap;
 	cred_t *cred = ap->a_cred;
 	xvattr_t xvap;
 	u_long fflags;
 	uint64_t zflags;
 
 	vattr_init_mask(vap);
 	vap->va_mask &= ~AT_NOSET;
 
 	xva_init(&xvap);
 	xvap.xva_vattr = *vap;
 
 	zflags = VTOZ(vp)->z_pflags;
 
 	if (vap->va_flags != VNOVAL) {
 		zfsvfs_t *zfsvfs = VTOZ(vp)->z_zfsvfs;
 		int error;
 
 		if (zfsvfs->z_use_fuids == B_FALSE)
 			return (EOPNOTSUPP);
 
 		fflags = vap->va_flags;
 		/*
 		 * XXX KDM 
 		 * We need to figure out whether it makes sense to allow
 		 * UF_REPARSE through, since we don't really have other
 		 * facilities to handle reparse points and zfs_setattr()
 		 * doesn't currently allow setting that attribute anyway.
 		 */
 		if ((fflags & ~(SF_IMMUTABLE|SF_APPEND|SF_NOUNLINK|UF_ARCHIVE|
 		     UF_NODUMP|UF_SYSTEM|UF_HIDDEN|UF_READONLY|UF_REPARSE|
 		     UF_OFFLINE|UF_SPARSE)) != 0)
 			return (EOPNOTSUPP);
 		/*
 		 * Unprivileged processes are not permitted to unset system
 		 * flags, or modify flags if any system flags are set.
 		 * Privileged non-jail processes may not modify system flags
 		 * if securelevel > 0 and any existing system flags are set.
 		 * Privileged jail processes behave like privileged non-jail
 		 * processes if the security.jail.chflags_allowed sysctl is
 		 * is non-zero; otherwise, they behave like unprivileged
 		 * processes.
 		 */
 		if (secpolicy_fs_owner(vp->v_mount, cred) == 0 ||
 		    priv_check_cred(cred, PRIV_VFS_SYSFLAGS, 0) == 0) {
 			if (zflags &
 			    (ZFS_IMMUTABLE | ZFS_APPENDONLY | ZFS_NOUNLINK)) {
 				error = securelevel_gt(cred, 0);
 				if (error != 0)
 					return (error);
 			}
 		} else {
 			/*
 			 * Callers may only modify the file flags on objects they
 			 * have VADMIN rights for.
 			 */
 			if ((error = VOP_ACCESS(vp, VADMIN, cred, curthread)) != 0)
 				return (error);
 			if (zflags &
 			    (ZFS_IMMUTABLE | ZFS_APPENDONLY | ZFS_NOUNLINK)) {
 				return (EPERM);
 			}
 			if (fflags &
 			    (SF_IMMUTABLE | SF_APPEND | SF_NOUNLINK)) {
 				return (EPERM);
 			}
 		}
 
 #define	FLAG_CHANGE(fflag, zflag, xflag, xfield)	do {		\
 	if (((fflags & (fflag)) && !(zflags & (zflag))) ||		\
 	    ((zflags & (zflag)) && !(fflags & (fflag)))) {		\
 		XVA_SET_REQ(&xvap, (xflag));				\
 		(xfield) = ((fflags & (fflag)) != 0);			\
 	}								\
 } while (0)
 		/* Convert chflags into ZFS-type flags. */
 		/* XXX: what about SF_SETTABLE?. */
 		FLAG_CHANGE(SF_IMMUTABLE, ZFS_IMMUTABLE, XAT_IMMUTABLE,
 		    xvap.xva_xoptattrs.xoa_immutable);
 		FLAG_CHANGE(SF_APPEND, ZFS_APPENDONLY, XAT_APPENDONLY,
 		    xvap.xva_xoptattrs.xoa_appendonly);
 		FLAG_CHANGE(SF_NOUNLINK, ZFS_NOUNLINK, XAT_NOUNLINK,
 		    xvap.xva_xoptattrs.xoa_nounlink);
 		FLAG_CHANGE(UF_ARCHIVE, ZFS_ARCHIVE, XAT_ARCHIVE,
 		    xvap.xva_xoptattrs.xoa_archive);
 		FLAG_CHANGE(UF_NODUMP, ZFS_NODUMP, XAT_NODUMP,
 		    xvap.xva_xoptattrs.xoa_nodump);
 		FLAG_CHANGE(UF_READONLY, ZFS_READONLY, XAT_READONLY,
 		    xvap.xva_xoptattrs.xoa_readonly);
 		FLAG_CHANGE(UF_SYSTEM, ZFS_SYSTEM, XAT_SYSTEM,
 		    xvap.xva_xoptattrs.xoa_system);
 		FLAG_CHANGE(UF_HIDDEN, ZFS_HIDDEN, XAT_HIDDEN,
 		    xvap.xva_xoptattrs.xoa_hidden);
 		FLAG_CHANGE(UF_REPARSE, ZFS_REPARSE, XAT_REPARSE,
 		    xvap.xva_xoptattrs.xoa_hidden);
 		FLAG_CHANGE(UF_OFFLINE, ZFS_OFFLINE, XAT_OFFLINE,
 		    xvap.xva_xoptattrs.xoa_offline);
 		FLAG_CHANGE(UF_SPARSE, ZFS_SPARSE, XAT_SPARSE,
 		    xvap.xva_xoptattrs.xoa_sparse);
 #undef	FLAG_CHANGE
 	}
 	return (zfs_setattr(vp, (vattr_t *)&xvap, 0, cred, NULL));
 }
 
 static int
 zfs_freebsd_rename(ap)
 	struct vop_rename_args  /* {
 		struct vnode *a_fdvp;
 		struct vnode *a_fvp;
 		struct componentname *a_fcnp;
 		struct vnode *a_tdvp;
 		struct vnode *a_tvp;
 		struct componentname *a_tcnp;
 	} */ *ap;
 {
 	vnode_t *fdvp = ap->a_fdvp;
 	vnode_t *fvp = ap->a_fvp;
 	vnode_t *tdvp = ap->a_tdvp;
 	vnode_t *tvp = ap->a_tvp;
 	int error;
 
 	ASSERT(ap->a_fcnp->cn_flags & (SAVENAME|SAVESTART));
 	ASSERT(ap->a_tcnp->cn_flags & (SAVENAME|SAVESTART));
 
-	/*
-	 * Check for cross-device rename.
-	 */
-	if ((fdvp->v_mount != tdvp->v_mount) ||
-	    (tvp && (fdvp->v_mount != tvp->v_mount)))
-		error = EXDEV;
-	else
-		error = zfs_rename(fdvp, ap->a_fcnp->cn_nameptr, tdvp,
-		    ap->a_tcnp->cn_nameptr, ap->a_fcnp->cn_cred, NULL, 0);
-	if (tdvp == tvp)
-		VN_RELE(tdvp);
-	else
-		VN_URELE(tdvp);
-	if (tvp)
-		VN_URELE(tvp);
-	VN_RELE(fdvp);
-	VN_RELE(fvp);
+	error = zfs_rename(fdvp, &fvp, ap->a_fcnp, tdvp, &tvp,
+	    ap->a_tcnp, ap->a_fcnp->cn_cred);
 
+	vrele(fdvp);
+	vrele(fvp);
+	vrele(tdvp);
+	if (tvp != NULL)
+		vrele(tvp);
+
 	return (error);
 }
 
 static int
 zfs_freebsd_symlink(ap)
 	struct vop_symlink_args /* {
 		struct vnode *a_dvp;
 		struct vnode **a_vpp;
 		struct componentname *a_cnp;
 		struct vattr *a_vap;
 		char *a_target;
 	} */ *ap;
 {
 	struct componentname *cnp = ap->a_cnp;
 	vattr_t *vap = ap->a_vap;
 
 	ASSERT(cnp->cn_flags & SAVENAME);
 
 	vap->va_type = VLNK;	/* FreeBSD: Syscall only sets va_mode. */
 	vattr_init_mask(vap);
 
 	return (zfs_symlink(ap->a_dvp, ap->a_vpp, cnp->cn_nameptr, vap,
 	    ap->a_target, cnp->cn_cred, cnp->cn_thread));
 }
 
 static int
 zfs_freebsd_readlink(ap)
 	struct vop_readlink_args /* {
 		struct vnode *a_vp;
 		struct uio *a_uio;
 		struct ucred *a_cred;
 	} */ *ap;
 {
 
 	return (zfs_readlink(ap->a_vp, ap->a_uio, ap->a_cred, NULL));
 }
 
 static int
 zfs_freebsd_link(ap)
 	struct vop_link_args /* {
 		struct vnode *a_tdvp;
 		struct vnode *a_vp;
 		struct componentname *a_cnp;
 	} */ *ap;
 {
 	struct componentname *cnp = ap->a_cnp;
 	vnode_t *vp = ap->a_vp;
 	vnode_t *tdvp = ap->a_tdvp;
 
 	if (tdvp->v_mount != vp->v_mount)
 		return (EXDEV);
 
 	ASSERT(cnp->cn_flags & SAVENAME);
 
 	return (zfs_link(tdvp, vp, cnp->cn_nameptr, cnp->cn_cred, NULL, 0));
 }
 
 static int
 zfs_freebsd_inactive(ap)
 	struct vop_inactive_args /* {
 		struct vnode *a_vp;
 		struct thread *a_td;
 	} */ *ap;
 {
 	vnode_t *vp = ap->a_vp;
 
 	zfs_inactive(vp, ap->a_td->td_ucred, NULL);
 	return (0);
 }
 
 static int
 zfs_freebsd_reclaim(ap)
 	struct vop_reclaim_args /* {
 		struct vnode *a_vp;
 		struct thread *a_td;
 	} */ *ap;
 {
 	vnode_t	*vp = ap->a_vp;
 	znode_t	*zp = VTOZ(vp);
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 
 	ASSERT(zp != NULL);
 
 	/* Destroy the vm object and flush associated pages. */
 	vnode_destroy_vobject(vp);
 
 	/*
 	 * z_teardown_inactive_lock protects from a race with
 	 * zfs_znode_dmu_fini in zfsvfs_teardown during
 	 * force unmount.
 	 */
 	rw_enter(&zfsvfs->z_teardown_inactive_lock, RW_READER);
 	if (zp->z_sa_hdl == NULL)
 		zfs_znode_free(zp);
 	else
 		zfs_zinactive(zp);
 	rw_exit(&zfsvfs->z_teardown_inactive_lock);
 
 	vp->v_data = NULL;
 	return (0);
 }
 
 static int
 zfs_freebsd_fid(ap)
 	struct vop_fid_args /* {
 		struct vnode *a_vp;
 		struct fid *a_fid;
 	} */ *ap;
 {
 
 	return (zfs_fid(ap->a_vp, (void *)ap->a_fid, NULL));
 }
 
 static int
 zfs_freebsd_pathconf(ap)
 	struct vop_pathconf_args /* {
 		struct vnode *a_vp;
 		int a_name;
 		register_t *a_retval;
 	} */ *ap;
 {
 	ulong_t val;
 	int error;
 
 	error = zfs_pathconf(ap->a_vp, ap->a_name, &val, curthread->td_ucred, NULL);
 	if (error == 0)
 		*ap->a_retval = val;
 	else if (error == EOPNOTSUPP)
 		error = vop_stdpathconf(ap);
 	return (error);
 }
 
 static int
 zfs_freebsd_fifo_pathconf(ap)
 	struct vop_pathconf_args /* {
 		struct vnode *a_vp;
 		int a_name;
 		register_t *a_retval;
 	} */ *ap;
 {
 
 	switch (ap->a_name) {
 	case _PC_ACL_EXTENDED:
 	case _PC_ACL_NFS4:
 	case _PC_ACL_PATH_MAX:
 	case _PC_MAC_PRESENT:
 		return (zfs_freebsd_pathconf(ap));
 	default:
 		return (fifo_specops.vop_pathconf(ap));
 	}
 }
 
 /*
  * FreeBSD's extended attributes namespace defines file name prefix for ZFS'
  * extended attribute name:
  *
  *	NAMESPACE	PREFIX	
  *	system		freebsd:system:
  *	user		(none, can be used to access ZFS fsattr(5) attributes
  *			created on Solaris)
  */
 static int
 zfs_create_attrname(int attrnamespace, const char *name, char *attrname,
     size_t size)
 {
 	const char *namespace, *prefix, *suffix;
 
 	/* We don't allow '/' character in attribute name. */
 	if (strchr(name, '/') != NULL)
 		return (EINVAL);
 	/* We don't allow attribute names that start with "freebsd:" string. */
 	if (strncmp(name, "freebsd:", 8) == 0)
 		return (EINVAL);
 
 	bzero(attrname, size);
 
 	switch (attrnamespace) {
 	case EXTATTR_NAMESPACE_USER:
 #if 0
 		prefix = "freebsd:";
 		namespace = EXTATTR_NAMESPACE_USER_STRING;
 		suffix = ":";
 #else
 		/*
 		 * This is the default namespace by which we can access all
 		 * attributes created on Solaris.
 		 */
 		prefix = namespace = suffix = "";
 #endif
 		break;
 	case EXTATTR_NAMESPACE_SYSTEM:
 		prefix = "freebsd:";
 		namespace = EXTATTR_NAMESPACE_SYSTEM_STRING;
 		suffix = ":";
 		break;
 	case EXTATTR_NAMESPACE_EMPTY:
 	default:
 		return (EINVAL);
 	}
 	if (snprintf(attrname, size, "%s%s%s%s", prefix, namespace, suffix,
 	    name) >= size) {
 		return (ENAMETOOLONG);
 	}
 	return (0);
 }
 
 /*
  * Vnode operating to retrieve a named extended attribute.
  */
 static int
 zfs_getextattr(struct vop_getextattr_args *ap)
 /*
 vop_getextattr {
 	IN struct vnode *a_vp;
 	IN int a_attrnamespace;
 	IN const char *a_name;
 	INOUT struct uio *a_uio;
 	OUT size_t *a_size;
 	IN struct ucred *a_cred;
 	IN struct thread *a_td;
 };
 */
 {
 	zfsvfs_t *zfsvfs = VTOZ(ap->a_vp)->z_zfsvfs;
 	struct thread *td = ap->a_td;
 	struct nameidata nd;
 	char attrname[255];
 	struct vattr va;
 	vnode_t *xvp = NULL, *vp;
 	int error, flags;
 
 	error = extattr_check_cred(ap->a_vp, ap->a_attrnamespace,
 	    ap->a_cred, ap->a_td, VREAD);
 	if (error != 0)
 		return (error);
 
 	error = zfs_create_attrname(ap->a_attrnamespace, ap->a_name, attrname,
 	    sizeof(attrname));
 	if (error != 0)
 		return (error);
 
 	ZFS_ENTER(zfsvfs);
 
 	error = zfs_lookup(ap->a_vp, NULL, &xvp, NULL, 0, ap->a_cred, td,
 	    LOOKUP_XATTR);
 	if (error != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	flags = FREAD;
 	NDINIT_ATVP(&nd, LOOKUP, NOFOLLOW, UIO_SYSSPACE, attrname,
 	    xvp, td);
 	error = vn_open_cred(&nd, &flags, 0, 0, ap->a_cred, NULL);
 	vp = nd.ni_vp;
 	NDFREE(&nd, NDF_ONLY_PNBUF);
 	if (error != 0) {
 		ZFS_EXIT(zfsvfs);
 		if (error == ENOENT)
 			error = ENOATTR;
 		return (error);
 	}
 
 	if (ap->a_size != NULL) {
 		error = VOP_GETATTR(vp, &va, ap->a_cred);
 		if (error == 0)
 			*ap->a_size = (size_t)va.va_size;
 	} else if (ap->a_uio != NULL)
 		error = VOP_READ(vp, ap->a_uio, IO_UNIT, ap->a_cred);
 
 	VOP_UNLOCK(vp, 0);
 	vn_close(vp, flags, ap->a_cred, td);
 	ZFS_EXIT(zfsvfs);
 
 	return (error);
 }
 
 /*
  * Vnode operation to remove a named attribute.
  */
 int
 zfs_deleteextattr(struct vop_deleteextattr_args *ap)
 /*
 vop_deleteextattr {
 	IN struct vnode *a_vp;
 	IN int a_attrnamespace;
 	IN const char *a_name;
 	IN struct ucred *a_cred;
 	IN struct thread *a_td;
 };
 */
 {
 	zfsvfs_t *zfsvfs = VTOZ(ap->a_vp)->z_zfsvfs;
 	struct thread *td = ap->a_td;
 	struct nameidata nd;
 	char attrname[255];
 	struct vattr va;
 	vnode_t *xvp = NULL, *vp;
 	int error, flags;
 
 	error = extattr_check_cred(ap->a_vp, ap->a_attrnamespace,
 	    ap->a_cred, ap->a_td, VWRITE);
 	if (error != 0)
 		return (error);
 
 	error = zfs_create_attrname(ap->a_attrnamespace, ap->a_name, attrname,
 	    sizeof(attrname));
 	if (error != 0)
 		return (error);
 
 	ZFS_ENTER(zfsvfs);
 
 	error = zfs_lookup(ap->a_vp, NULL, &xvp, NULL, 0, ap->a_cred, td,
 	    LOOKUP_XATTR);
 	if (error != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	NDINIT_ATVP(&nd, DELETE, NOFOLLOW | LOCKPARENT | LOCKLEAF,
 	    UIO_SYSSPACE, attrname, xvp, td);
 	error = namei(&nd);
 	vp = nd.ni_vp;
 	if (error != 0) {
 		ZFS_EXIT(zfsvfs);
 		NDFREE(&nd, NDF_ONLY_PNBUF);
 		if (error == ENOENT)
 			error = ENOATTR;
 		return (error);
 	}
 
 	error = VOP_REMOVE(nd.ni_dvp, vp, &nd.ni_cnd);
 	NDFREE(&nd, NDF_ONLY_PNBUF);
 
 	vput(nd.ni_dvp);
 	if (vp == nd.ni_dvp)
 		vrele(vp);
 	else
 		vput(vp);
 	ZFS_EXIT(zfsvfs);
 
 	return (error);
 }
 
 /*
  * Vnode operation to set a named attribute.
  */
 static int
 zfs_setextattr(struct vop_setextattr_args *ap)
 /*
 vop_setextattr {
 	IN struct vnode *a_vp;
 	IN int a_attrnamespace;
 	IN const char *a_name;
 	INOUT struct uio *a_uio;
 	IN struct ucred *a_cred;
 	IN struct thread *a_td;
 };
 */
 {
 	zfsvfs_t *zfsvfs = VTOZ(ap->a_vp)->z_zfsvfs;
 	struct thread *td = ap->a_td;
 	struct nameidata nd;
 	char attrname[255];
 	struct vattr va;
 	vnode_t *xvp = NULL, *vp;
 	int error, flags;
 
 	error = extattr_check_cred(ap->a_vp, ap->a_attrnamespace,
 	    ap->a_cred, ap->a_td, VWRITE);
 	if (error != 0)
 		return (error);
 
 	error = zfs_create_attrname(ap->a_attrnamespace, ap->a_name, attrname,
 	    sizeof(attrname));
 	if (error != 0)
 		return (error);
 
 	ZFS_ENTER(zfsvfs);
 
 	error = zfs_lookup(ap->a_vp, NULL, &xvp, NULL, 0, ap->a_cred, td,
 	    LOOKUP_XATTR | CREATE_XATTR_DIR);
 	if (error != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	flags = FFLAGS(O_WRONLY | O_CREAT);
 	NDINIT_ATVP(&nd, LOOKUP, NOFOLLOW, UIO_SYSSPACE, attrname,
 	    xvp, td);
 	error = vn_open_cred(&nd, &flags, 0600, 0, ap->a_cred, NULL);
 	vp = nd.ni_vp;
 	NDFREE(&nd, NDF_ONLY_PNBUF);
 	if (error != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	VATTR_NULL(&va);
 	va.va_size = 0;
 	error = VOP_SETATTR(vp, &va, ap->a_cred);
 	if (error == 0)
 		VOP_WRITE(vp, ap->a_uio, IO_UNIT, ap->a_cred);
 
 	VOP_UNLOCK(vp, 0);
 	vn_close(vp, flags, ap->a_cred, td);
 	ZFS_EXIT(zfsvfs);
 
 	return (error);
 }
 
 /*
  * Vnode operation to retrieve extended attributes on a vnode.
  */
 static int
 zfs_listextattr(struct vop_listextattr_args *ap)
 /*
 vop_listextattr {
 	IN struct vnode *a_vp;
 	IN int a_attrnamespace;
 	INOUT struct uio *a_uio;
 	OUT size_t *a_size;
 	IN struct ucred *a_cred;
 	IN struct thread *a_td;
 };
 */
 {
 	zfsvfs_t *zfsvfs = VTOZ(ap->a_vp)->z_zfsvfs;
 	struct thread *td = ap->a_td;
 	struct nameidata nd;
 	char attrprefix[16];
 	u_char dirbuf[sizeof(struct dirent)];
 	struct dirent *dp;
 	struct iovec aiov;
 	struct uio auio, *uio = ap->a_uio;
 	size_t *sizep = ap->a_size;
 	size_t plen;
 	vnode_t *xvp = NULL, *vp;
 	int done, error, eof, pos;
 
 	error = extattr_check_cred(ap->a_vp, ap->a_attrnamespace,
 	    ap->a_cred, ap->a_td, VREAD);
 	if (error != 0)
 		return (error);
 
 	error = zfs_create_attrname(ap->a_attrnamespace, "", attrprefix,
 	    sizeof(attrprefix));
 	if (error != 0)
 		return (error);
 	plen = strlen(attrprefix);
 
 	ZFS_ENTER(zfsvfs);
 
 	if (sizep != NULL)
 		*sizep = 0;
 
 	error = zfs_lookup(ap->a_vp, NULL, &xvp, NULL, 0, ap->a_cred, td,
 	    LOOKUP_XATTR);
 	if (error != 0) {
 		ZFS_EXIT(zfsvfs);
 		/*
 		 * ENOATTR means that the EA directory does not yet exist,
 		 * i.e. there are no extended attributes there.
 		 */
 		if (error == ENOATTR)
 			error = 0;
 		return (error);
 	}
 
 	NDINIT_ATVP(&nd, LOOKUP, NOFOLLOW | LOCKLEAF | LOCKSHARED,
 	    UIO_SYSSPACE, ".", xvp, td);
 	error = namei(&nd);
 	vp = nd.ni_vp;
 	NDFREE(&nd, NDF_ONLY_PNBUF);
 	if (error != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	auio.uio_iov = &aiov;
 	auio.uio_iovcnt = 1;
 	auio.uio_segflg = UIO_SYSSPACE;
 	auio.uio_td = td;
 	auio.uio_rw = UIO_READ;
 	auio.uio_offset = 0;
 
 	do {
 		u_char nlen;
 
 		aiov.iov_base = (void *)dirbuf;
 		aiov.iov_len = sizeof(dirbuf);
 		auio.uio_resid = sizeof(dirbuf);
 		error = VOP_READDIR(vp, &auio, ap->a_cred, &eof, NULL, NULL);
 		done = sizeof(dirbuf) - auio.uio_resid;
 		if (error != 0)
 			break;
 		for (pos = 0; pos < done;) {
 			dp = (struct dirent *)(dirbuf + pos);
 			pos += dp->d_reclen;
 			/*
 			 * XXX: Temporarily we also accept DT_UNKNOWN, as this
 			 * is what we get when attribute was created on Solaris.
 			 */
 			if (dp->d_type != DT_REG && dp->d_type != DT_UNKNOWN)
 				continue;
 			if (plen == 0 && strncmp(dp->d_name, "freebsd:", 8) == 0)
 				continue;
 			else if (strncmp(dp->d_name, attrprefix, plen) != 0)
 				continue;
 			nlen = dp->d_namlen - plen;
 			if (sizep != NULL)
 				*sizep += 1 + nlen;
 			else if (uio != NULL) {
 				/*
 				 * Format of extattr name entry is one byte for
 				 * length and the rest for name.
 				 */
 				error = uiomove(&nlen, 1, uio->uio_rw, uio);
 				if (error == 0) {
 					error = uiomove(dp->d_name + plen, nlen,
 					    uio->uio_rw, uio);
 				}
 				if (error != 0)
 					break;
 			}
 		}
 	} while (!eof && error == 0);
 
 	vput(vp);
 	ZFS_EXIT(zfsvfs);
 
 	return (error);
 }
 
 int
 zfs_freebsd_getacl(ap)
 	struct vop_getacl_args /* {
 		struct vnode *vp;
 		acl_type_t type;
 		struct acl *aclp;
 		struct ucred *cred;
 		struct thread *td;
 	} */ *ap;
 {
 	int		error;
 	vsecattr_t      vsecattr;
 
 	if (ap->a_type != ACL_TYPE_NFS4)
 		return (EINVAL);
 
 	vsecattr.vsa_mask = VSA_ACE | VSA_ACECNT;
 	if (error = zfs_getsecattr(ap->a_vp, &vsecattr, 0, ap->a_cred, NULL))
 		return (error);
 
 	error = acl_from_aces(ap->a_aclp, vsecattr.vsa_aclentp, vsecattr.vsa_aclcnt);
 	if (vsecattr.vsa_aclentp != NULL)
 		kmem_free(vsecattr.vsa_aclentp, vsecattr.vsa_aclentsz);
 
 	return (error);
 }
 
 int
 zfs_freebsd_setacl(ap)
 	struct vop_setacl_args /* {
 		struct vnode *vp;
 		acl_type_t type;
 		struct acl *aclp;
 		struct ucred *cred;
 		struct thread *td;
 	} */ *ap;
 {
 	int		error;
 	vsecattr_t      vsecattr;
 	int		aclbsize;	/* size of acl list in bytes */
 	aclent_t	*aaclp;
 
 	if (ap->a_type != ACL_TYPE_NFS4)
 		return (EINVAL);
 
 	if (ap->a_aclp->acl_cnt < 1 || ap->a_aclp->acl_cnt > MAX_ACL_ENTRIES)
 		return (EINVAL);
 
 	/*
 	 * With NFSv4 ACLs, chmod(2) may need to add additional entries,
 	 * splitting every entry into two and appending "canonical six"
 	 * entries at the end.  Don't allow for setting an ACL that would
 	 * cause chmod(2) to run out of ACL entries.
 	 */
 	if (ap->a_aclp->acl_cnt * 2 + 6 > ACL_MAX_ENTRIES)
 		return (ENOSPC);
 
 	error = acl_nfs4_check(ap->a_aclp, ap->a_vp->v_type == VDIR);
 	if (error != 0)
 		return (error);
 
 	vsecattr.vsa_mask = VSA_ACE;
 	aclbsize = ap->a_aclp->acl_cnt * sizeof(ace_t);
 	vsecattr.vsa_aclentp = kmem_alloc(aclbsize, KM_SLEEP);
 	aaclp = vsecattr.vsa_aclentp;
 	vsecattr.vsa_aclentsz = aclbsize;
 
 	aces_from_acl(vsecattr.vsa_aclentp, &vsecattr.vsa_aclcnt, ap->a_aclp);
 	error = zfs_setsecattr(ap->a_vp, &vsecattr, 0, ap->a_cred, NULL);
 	kmem_free(aaclp, aclbsize);
 
 	return (error);
 }
 
 int
 zfs_freebsd_aclcheck(ap)
 	struct vop_aclcheck_args /* {
 		struct vnode *vp;
 		acl_type_t type;
 		struct acl *aclp;
 		struct ucred *cred;
 		struct thread *td;
 	} */ *ap;
 {
 
 	return (EOPNOTSUPP);
 }
 
 static int
 zfs_vptocnp(struct vop_vptocnp_args *ap)
 {
 	vnode_t *covered_vp;
 	vnode_t *vp = ap->a_vp;;
 	zfsvfs_t *zfsvfs = vp->v_vfsp->vfs_data;
 	znode_t *zp = VTOZ(vp);
 	uint64_t parent;
 	int ltype;
 	int error;
 
 	ZFS_ENTER(zfsvfs);
 	ZFS_VERIFY_ZP(zp);
 
 	/*
 	 * If we are a snapshot mounted under .zfs, run the operation
 	 * on the covered vnode.
 	 */
 	if ((error = sa_lookup(zp->z_sa_hdl,
 	    SA_ZPL_PARENT(zfsvfs), &parent, sizeof (parent))) != 0) {
 		ZFS_EXIT(zfsvfs);
 		return (error);
 	}
 
 	if (zp->z_id != parent || zfsvfs->z_parent == zfsvfs) {
 		ZFS_EXIT(zfsvfs);
 		return (vop_stdvptocnp(ap));
 	}
 	ZFS_EXIT(zfsvfs);
 
 	covered_vp = vp->v_mount->mnt_vnodecovered;
 	vhold(covered_vp);
 	ltype = VOP_ISLOCKED(vp);
 	VOP_UNLOCK(vp, 0);
 	error = vget(covered_vp, LK_EXCLUSIVE | LK_VNHELD, curthread);
 	if (error == 0) {
 		error = VOP_VPTOCNP(covered_vp, ap->a_vpp, ap->a_cred,
 		    ap->a_buf, ap->a_buflen);
 		vput(covered_vp);
 	}
 	vn_lock(vp, ltype | LK_RETRY);
 	if ((vp->v_iflag & VI_DOOMED) != 0)
 		error = SET_ERROR(ENOENT);
 	return (error);
 }
 
+#ifdef DIAGNOSTIC
+static int
+zfs_lock(ap)
+	struct vop_lock1_args /* {
+		struct vnode *a_vp;
+		int a_flags;
+		char *file;
+		int line;
+	} */ *ap;
+{
+	zfsvfs_t *zfsvfs;
+	znode_t *zp;
+	vnode_t *vp;
+	int flags;
+	int err;
+
+	vp = ap->a_vp;
+	flags = ap->a_flags;
+	if ((flags & LK_INTERLOCK) == 0 && (flags & LK_NOWAIT) == 0 &&
+	    (vp->v_iflag & VI_DOOMED) == 0 && (zp = vp->v_data) != NULL) {
+		zfsvfs = zp->z_zfsvfs;
+		VERIFY(!RRM_LOCK_HELD(&zfsvfs->z_teardown_lock));
+	}
+	err = vop_stdlock(ap);
+	if ((flags & LK_INTERLOCK) != 0 && (flags & LK_NOWAIT) == 0 &&
+	    (vp->v_iflag & VI_DOOMED) == 0 && (zp = vp->v_data) != NULL) {
+		zfsvfs = zp->z_zfsvfs;
+		VERIFY(!RRM_LOCK_HELD(&zfsvfs->z_teardown_lock));
+	}
+	return (err);
+}
+#endif
+
 struct vop_vector zfs_vnodeops;
 struct vop_vector zfs_fifoops;
 struct vop_vector zfs_shareops;
 
 struct vop_vector zfs_vnodeops = {
 	.vop_default =		&default_vnodeops,
 	.vop_inactive =		zfs_freebsd_inactive,
 	.vop_reclaim =		zfs_freebsd_reclaim,
 	.vop_access =		zfs_freebsd_access,
-#ifdef FREEBSD_NAMECACHE
-	.vop_lookup =		vfs_cache_lookup,
+	.vop_lookup =		zfs_cache_lookup,
 	.vop_cachedlookup =	zfs_freebsd_lookup,
-#else
-	.vop_lookup =		zfs_freebsd_lookup,
-#endif
 	.vop_getattr =		zfs_freebsd_getattr,
 	.vop_setattr =		zfs_freebsd_setattr,
 	.vop_create =		zfs_freebsd_create,
 	.vop_mknod =		zfs_freebsd_create,
 	.vop_mkdir =		zfs_freebsd_mkdir,
 	.vop_readdir =		zfs_freebsd_readdir,
 	.vop_fsync =		zfs_freebsd_fsync,
 	.vop_open =		zfs_freebsd_open,
 	.vop_close =		zfs_freebsd_close,
 	.vop_rmdir =		zfs_freebsd_rmdir,
 	.vop_ioctl =		zfs_freebsd_ioctl,
 	.vop_link =		zfs_freebsd_link,
 	.vop_symlink =		zfs_freebsd_symlink,
 	.vop_readlink =		zfs_freebsd_readlink,
 	.vop_read =		zfs_freebsd_read,
 	.vop_write =		zfs_freebsd_write,
 	.vop_remove =		zfs_freebsd_remove,
 	.vop_rename =		zfs_freebsd_rename,
 	.vop_pathconf =		zfs_freebsd_pathconf,
 	.vop_bmap =		zfs_freebsd_bmap,
 	.vop_fid =		zfs_freebsd_fid,
 	.vop_getextattr =	zfs_getextattr,
 	.vop_deleteextattr =	zfs_deleteextattr,
 	.vop_setextattr =	zfs_setextattr,
 	.vop_listextattr =	zfs_listextattr,
 	.vop_getacl =		zfs_freebsd_getacl,
 	.vop_setacl =		zfs_freebsd_setacl,
 	.vop_aclcheck =		zfs_freebsd_aclcheck,
 	.vop_getpages =		zfs_freebsd_getpages,
 	.vop_putpages =		zfs_freebsd_putpages,
 	.vop_vptocnp =		zfs_vptocnp,
+#ifdef DIAGNOSTIC
+	.vop_lock1 =		zfs_lock,
+#endif
 };
 
 struct vop_vector zfs_fifoops = {
 	.vop_default =		&fifo_specops,
 	.vop_fsync =		zfs_freebsd_fsync,
 	.vop_access =		zfs_freebsd_access,
 	.vop_getattr =		zfs_freebsd_getattr,
 	.vop_inactive =		zfs_freebsd_inactive,
 	.vop_read =		VOP_PANIC,
 	.vop_reclaim =		zfs_freebsd_reclaim,
 	.vop_setattr =		zfs_freebsd_setattr,
 	.vop_write =		VOP_PANIC,
 	.vop_pathconf = 	zfs_freebsd_fifo_pathconf,
 	.vop_fid =		zfs_freebsd_fid,
 	.vop_getacl =		zfs_freebsd_getacl,
 	.vop_setacl =		zfs_freebsd_setacl,
 	.vop_aclcheck =		zfs_freebsd_aclcheck,
 };
 
 /*
  * special share hidden files vnode operations template
  */
 struct vop_vector zfs_shareops = {
 	.vop_default =		&default_vnodeops,
 	.vop_access =		zfs_freebsd_access,
 	.vop_inactive =		zfs_freebsd_inactive,
 	.vop_reclaim =		zfs_freebsd_reclaim,
 	.vop_fid =		zfs_freebsd_fid,
 	.vop_pathconf =		zfs_freebsd_pathconf,
 };
Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c	(revision 303775)
@@ -1,2202 +1,2194 @@
 /*
  * CDDL HEADER START
  *
  * The contents of this file are subject to the terms of the
  * Common Development and Distribution License (the "License").
  * You may not use this file except in compliance with the License.
  *
  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
  * or http://www.opensolaris.org/os/licensing.
  * See the License for the specific language governing permissions
  * and limitations under the License.
  *
  * When distributing Covered Code, include this CDDL HEADER in each
  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  * If applicable, add the following below this CDDL HEADER, with the
  * fields enclosed by brackets "[]" replaced with your own identifying
  * information: Portions Copyright [yyyy] [name of copyright owner]
  *
  * CDDL HEADER END
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  * Copyright (c) 2012, 2014 by Delphix. All rights reserved.
  * Copyright (c) 2014 Integros [integros.com]
  */
 
 /* Portions Copyright 2007 Jeremy Teo */
 /* Portions Copyright 2011 Martin Matuska <mm@FreeBSD.org> */
 
 #ifdef _KERNEL
 #include <sys/types.h>
 #include <sys/param.h>
 #include <sys/time.h>
 #include <sys/systm.h>
 #include <sys/sysmacros.h>
 #include <sys/resource.h>
 #include <sys/mntent.h>
 #include <sys/u8_textprep.h>
 #include <sys/dsl_dataset.h>
 #include <sys/vfs.h>
 #include <sys/vnode.h>
 #include <sys/file.h>
 #include <sys/kmem.h>
 #include <sys/errno.h>
 #include <sys/unistd.h>
 #include <sys/atomic.h>
 #include <sys/zfs_dir.h>
 #include <sys/zfs_acl.h>
 #include <sys/zfs_ioctl.h>
 #include <sys/zfs_rlock.h>
 #include <sys/zfs_fuid.h>
 #include <sys/dnode.h>
 #include <sys/fs/zfs.h>
 #include <sys/kidmap.h>
 #endif /* _KERNEL */
 
 #include <sys/dmu.h>
 #include <sys/dmu_objset.h>
 #include <sys/refcount.h>
 #include <sys/stat.h>
 #include <sys/zap.h>
 #include <sys/zfs_znode.h>
 #include <sys/sa.h>
 #include <sys/zfs_sa.h>
 #include <sys/zfs_stat.h>
 #include <sys/refcount.h>
 
 #include "zfs_prop.h"
 #include "zfs_comutil.h"
 
 /* Used by fstat(1). */
 SYSCTL_INT(_debug_sizeof, OID_AUTO, znode, CTLFLAG_RD,
     SYSCTL_NULL_INT_PTR, sizeof(znode_t), "sizeof(znode_t)");
 
 /*
  * Define ZNODE_STATS to turn on statistic gathering. By default, it is only
  * turned on when DEBUG is also defined.
  */
 #ifdef	DEBUG
 #define	ZNODE_STATS
 #endif	/* DEBUG */
 
 #ifdef	ZNODE_STATS
 #define	ZNODE_STAT_ADD(stat)			((stat)++)
 #else
 #define	ZNODE_STAT_ADD(stat)			/* nothing */
 #endif	/* ZNODE_STATS */
 
 /*
  * Functions needed for userland (ie: libzpool) are not put under
  * #ifdef_KERNEL; the rest of the functions have dependencies
  * (such as VFS logic) that will not compile easily in userland.
  */
 #ifdef _KERNEL
 /*
  * Needed to close a small window in zfs_znode_move() that allows the zfsvfs to
  * be freed before it can be safely accessed.
  */
 krwlock_t zfsvfs_lock;
 
 static kmem_cache_t *znode_cache = NULL;
 
 /*ARGSUSED*/
 static void
 znode_evict_error(dmu_buf_t *dbuf, void *user_ptr)
 {
 	/*
 	 * We should never drop all dbuf refs without first clearing
 	 * the eviction callback.
 	 */
 	panic("evicting znode %p\n", user_ptr);
 }
 
 extern struct vop_vector zfs_vnodeops;
 extern struct vop_vector zfs_fifoops;
 extern struct vop_vector zfs_shareops;
 
 static int
 zfs_znode_cache_constructor(void *buf, void *arg, int kmflags)
 {
 	znode_t *zp = buf;
 
 	POINTER_INVALIDATE(&zp->z_zfsvfs);
 
 	list_link_init(&zp->z_link_node);
 
-	mutex_init(&zp->z_lock, NULL, MUTEX_DEFAULT, NULL);
-	rw_init(&zp->z_parent_lock, NULL, RW_DEFAULT, NULL);
-	rw_init(&zp->z_name_lock, NULL, RW_DEFAULT, NULL);
 	mutex_init(&zp->z_acl_lock, NULL, MUTEX_DEFAULT, NULL);
 
 	mutex_init(&zp->z_range_lock, NULL, MUTEX_DEFAULT, NULL);
 	avl_create(&zp->z_range_avl, zfs_range_compare,
 	    sizeof (rl_t), offsetof(rl_t, r_node));
 
-	zp->z_dirlocks = NULL;
 	zp->z_acl_cached = NULL;
 	zp->z_vnode = NULL;
 	zp->z_moved = 0;
 	return (0);
 }
 
 /*ARGSUSED*/
 static void
 zfs_znode_cache_destructor(void *buf, void *arg)
 {
 	znode_t *zp = buf;
 
 	ASSERT(!POINTER_IS_VALID(zp->z_zfsvfs));
 	ASSERT(ZTOV(zp) == NULL);
 	vn_free(ZTOV(zp));
 	ASSERT(!list_link_active(&zp->z_link_node));
-	mutex_destroy(&zp->z_lock);
-	rw_destroy(&zp->z_parent_lock);
-	rw_destroy(&zp->z_name_lock);
 	mutex_destroy(&zp->z_acl_lock);
 	avl_destroy(&zp->z_range_avl);
 	mutex_destroy(&zp->z_range_lock);
 
-	ASSERT(zp->z_dirlocks == NULL);
 	ASSERT(zp->z_acl_cached == NULL);
 }
 
 #ifdef	ZNODE_STATS
 static struct {
 	uint64_t zms_zfsvfs_invalid;
 	uint64_t zms_zfsvfs_recheck1;
 	uint64_t zms_zfsvfs_unmounted;
 	uint64_t zms_zfsvfs_recheck2;
 	uint64_t zms_obj_held;
 	uint64_t zms_vnode_locked;
 	uint64_t zms_not_only_dnlc;
 } znode_move_stats;
 #endif	/* ZNODE_STATS */
 
 #ifdef illumos
 static void
 zfs_znode_move_impl(znode_t *ozp, znode_t *nzp)
 {
 	vnode_t *vp;
 
 	/* Copy fields. */
 	nzp->z_zfsvfs = ozp->z_zfsvfs;
 
 	/* Swap vnodes. */
 	vp = nzp->z_vnode;
 	nzp->z_vnode = ozp->z_vnode;
 	ozp->z_vnode = vp; /* let destructor free the overwritten vnode */
 	ZTOV(ozp)->v_data = ozp;
 	ZTOV(nzp)->v_data = nzp;
 
 	nzp->z_id = ozp->z_id;
 	ASSERT(ozp->z_dirlocks == NULL); /* znode not in use */
 	ASSERT(avl_numnodes(&ozp->z_range_avl) == 0);
 	nzp->z_unlinked = ozp->z_unlinked;
 	nzp->z_atime_dirty = ozp->z_atime_dirty;
 	nzp->z_zn_prefetch = ozp->z_zn_prefetch;
 	nzp->z_blksz = ozp->z_blksz;
 	nzp->z_seq = ozp->z_seq;
 	nzp->z_mapcnt = ozp->z_mapcnt;
 	nzp->z_gen = ozp->z_gen;
 	nzp->z_sync_cnt = ozp->z_sync_cnt;
 	nzp->z_is_sa = ozp->z_is_sa;
 	nzp->z_sa_hdl = ozp->z_sa_hdl;
 	bcopy(ozp->z_atime, nzp->z_atime, sizeof (uint64_t) * 2);
 	nzp->z_links = ozp->z_links;
 	nzp->z_size = ozp->z_size;
 	nzp->z_pflags = ozp->z_pflags;
 	nzp->z_uid = ozp->z_uid;
 	nzp->z_gid = ozp->z_gid;
 	nzp->z_mode = ozp->z_mode;
 
 	/*
 	 * Since this is just an idle znode and kmem is already dealing with
 	 * memory pressure, release any cached ACL.
 	 */
 	if (ozp->z_acl_cached) {
 		zfs_acl_free(ozp->z_acl_cached);
 		ozp->z_acl_cached = NULL;
 	}
 
 	sa_set_userp(nzp->z_sa_hdl, nzp);
 
 	/*
 	 * Invalidate the original znode by clearing fields that provide a
 	 * pointer back to the znode. Set the low bit of the vfs pointer to
 	 * ensure that zfs_znode_move() recognizes the znode as invalid in any
 	 * subsequent callback.
 	 */
 	ozp->z_sa_hdl = NULL;
 	POINTER_INVALIDATE(&ozp->z_zfsvfs);
 
 	/*
 	 * Mark the znode.
 	 */
 	nzp->z_moved = 1;
 	ozp->z_moved = (uint8_t)-1;
 }
 
 /*ARGSUSED*/
 static kmem_cbrc_t
 zfs_znode_move(void *buf, void *newbuf, size_t size, void *arg)
 {
 	znode_t *ozp = buf, *nzp = newbuf;
 	zfsvfs_t *zfsvfs;
 	vnode_t *vp;
 
 	/*
 	 * The znode is on the file system's list of known znodes if the vfs
 	 * pointer is valid. We set the low bit of the vfs pointer when freeing
 	 * the znode to invalidate it, and the memory patterns written by kmem
 	 * (baddcafe and deadbeef) set at least one of the two low bits. A newly
 	 * created znode sets the vfs pointer last of all to indicate that the
 	 * znode is known and in a valid state to be moved by this function.
 	 */
 	zfsvfs = ozp->z_zfsvfs;
 	if (!POINTER_IS_VALID(zfsvfs)) {
 		ZNODE_STAT_ADD(znode_move_stats.zms_zfsvfs_invalid);
 		return (KMEM_CBRC_DONT_KNOW);
 	}
 
 	/*
 	 * Close a small window in which it's possible that the filesystem could
 	 * be unmounted and freed, and zfsvfs, though valid in the previous
 	 * statement, could point to unrelated memory by the time we try to
 	 * prevent the filesystem from being unmounted.
 	 */
 	rw_enter(&zfsvfs_lock, RW_WRITER);
 	if (zfsvfs != ozp->z_zfsvfs) {
 		rw_exit(&zfsvfs_lock);
 		ZNODE_STAT_ADD(znode_move_stats.zms_zfsvfs_recheck1);
 		return (KMEM_CBRC_DONT_KNOW);
 	}
 
 	/*
 	 * If the znode is still valid, then so is the file system. We know that
 	 * no valid file system can be freed while we hold zfsvfs_lock, so we
 	 * can safely ensure that the filesystem is not and will not be
 	 * unmounted. The next statement is equivalent to ZFS_ENTER().
 	 */
 	rrm_enter(&zfsvfs->z_teardown_lock, RW_READER, FTAG);
 	if (zfsvfs->z_unmounted) {
 		ZFS_EXIT(zfsvfs);
 		rw_exit(&zfsvfs_lock);
 		ZNODE_STAT_ADD(znode_move_stats.zms_zfsvfs_unmounted);
 		return (KMEM_CBRC_DONT_KNOW);
 	}
 	rw_exit(&zfsvfs_lock);
 
 	mutex_enter(&zfsvfs->z_znodes_lock);
 	/*
 	 * Recheck the vfs pointer in case the znode was removed just before
 	 * acquiring the lock.
 	 */
 	if (zfsvfs != ozp->z_zfsvfs) {
 		mutex_exit(&zfsvfs->z_znodes_lock);
 		ZFS_EXIT(zfsvfs);
 		ZNODE_STAT_ADD(znode_move_stats.zms_zfsvfs_recheck2);
 		return (KMEM_CBRC_DONT_KNOW);
 	}
 
 	/*
 	 * At this point we know that as long as we hold z_znodes_lock, the
 	 * znode cannot be freed and fields within the znode can be safely
 	 * accessed. Now, prevent a race with zfs_zget().
 	 */
 	if (ZFS_OBJ_HOLD_TRYENTER(zfsvfs, ozp->z_id) == 0) {
 		mutex_exit(&zfsvfs->z_znodes_lock);
 		ZFS_EXIT(zfsvfs);
 		ZNODE_STAT_ADD(znode_move_stats.zms_obj_held);
 		return (KMEM_CBRC_LATER);
 	}
 
 	vp = ZTOV(ozp);
 	if (mutex_tryenter(&vp->v_lock) == 0) {
 		ZFS_OBJ_HOLD_EXIT(zfsvfs, ozp->z_id);
 		mutex_exit(&zfsvfs->z_znodes_lock);
 		ZFS_EXIT(zfsvfs);
 		ZNODE_STAT_ADD(znode_move_stats.zms_vnode_locked);
 		return (KMEM_CBRC_LATER);
 	}
 
 	/* Only move znodes that are referenced _only_ by the DNLC. */
 	if (vp->v_count != 1 || !vn_in_dnlc(vp)) {
 		mutex_exit(&vp->v_lock);
 		ZFS_OBJ_HOLD_EXIT(zfsvfs, ozp->z_id);
 		mutex_exit(&zfsvfs->z_znodes_lock);
 		ZFS_EXIT(zfsvfs);
 		ZNODE_STAT_ADD(znode_move_stats.zms_not_only_dnlc);
 		return (KMEM_CBRC_LATER);
 	}
 
 	/*
 	 * The znode is known and in a valid state to move. We're holding the
 	 * locks needed to execute the critical section.
 	 */
 	zfs_znode_move_impl(ozp, nzp);
 	mutex_exit(&vp->v_lock);
 	ZFS_OBJ_HOLD_EXIT(zfsvfs, ozp->z_id);
 
 	list_link_replace(&ozp->z_link_node, &nzp->z_link_node);
 	mutex_exit(&zfsvfs->z_znodes_lock);
 	ZFS_EXIT(zfsvfs);
 
 	return (KMEM_CBRC_YES);
 }
 #endif /* illumos */
 
 void
 zfs_znode_init(void)
 {
 	/*
 	 * Initialize zcache
 	 */
 	rw_init(&zfsvfs_lock, NULL, RW_DEFAULT, NULL);
 	ASSERT(znode_cache == NULL);
 	znode_cache = kmem_cache_create("zfs_znode_cache",
 	    sizeof (znode_t), 0, zfs_znode_cache_constructor,
 	    zfs_znode_cache_destructor, NULL, NULL, NULL, 0);
 	kmem_cache_set_move(znode_cache, zfs_znode_move);
 }
 
 void
 zfs_znode_fini(void)
 {
 #ifdef illumos
 	/*
 	 * Cleanup vfs & vnode ops
 	 */
 	zfs_remove_op_tables();
 #endif
 
 	/*
 	 * Cleanup zcache
 	 */
 	if (znode_cache)
 		kmem_cache_destroy(znode_cache);
 	znode_cache = NULL;
 	rw_destroy(&zfsvfs_lock);
 }
 
 #ifdef illumos
 struct vnodeops *zfs_dvnodeops;
 struct vnodeops *zfs_fvnodeops;
 struct vnodeops *zfs_symvnodeops;
 struct vnodeops *zfs_xdvnodeops;
 struct vnodeops *zfs_evnodeops;
 struct vnodeops *zfs_sharevnodeops;
 
 void
 zfs_remove_op_tables()
 {
 	/*
 	 * Remove vfs ops
 	 */
 	ASSERT(zfsfstype);
 	(void) vfs_freevfsops_by_type(zfsfstype);
 	zfsfstype = 0;
 
 	/*
 	 * Remove vnode ops
 	 */
 	if (zfs_dvnodeops)
 		vn_freevnodeops(zfs_dvnodeops);
 	if (zfs_fvnodeops)
 		vn_freevnodeops(zfs_fvnodeops);
 	if (zfs_symvnodeops)
 		vn_freevnodeops(zfs_symvnodeops);
 	if (zfs_xdvnodeops)
 		vn_freevnodeops(zfs_xdvnodeops);
 	if (zfs_evnodeops)
 		vn_freevnodeops(zfs_evnodeops);
 	if (zfs_sharevnodeops)
 		vn_freevnodeops(zfs_sharevnodeops);
 
 	zfs_dvnodeops = NULL;
 	zfs_fvnodeops = NULL;
 	zfs_symvnodeops = NULL;
 	zfs_xdvnodeops = NULL;
 	zfs_evnodeops = NULL;
 	zfs_sharevnodeops = NULL;
 }
 
 extern const fs_operation_def_t zfs_dvnodeops_template[];
 extern const fs_operation_def_t zfs_fvnodeops_template[];
 extern const fs_operation_def_t zfs_xdvnodeops_template[];
 extern const fs_operation_def_t zfs_symvnodeops_template[];
 extern const fs_operation_def_t zfs_evnodeops_template[];
 extern const fs_operation_def_t zfs_sharevnodeops_template[];
 
 int
 zfs_create_op_tables()
 {
 	int error;
 
 	/*
 	 * zfs_dvnodeops can be set if mod_remove() calls mod_installfs()
 	 * due to a failure to remove the the 2nd modlinkage (zfs_modldrv).
 	 * In this case we just return as the ops vectors are already set up.
 	 */
 	if (zfs_dvnodeops)
 		return (0);
 
 	error = vn_make_ops(MNTTYPE_ZFS, zfs_dvnodeops_template,
 	    &zfs_dvnodeops);
 	if (error)
 		return (error);
 
 	error = vn_make_ops(MNTTYPE_ZFS, zfs_fvnodeops_template,
 	    &zfs_fvnodeops);
 	if (error)
 		return (error);
 
 	error = vn_make_ops(MNTTYPE_ZFS, zfs_symvnodeops_template,
 	    &zfs_symvnodeops);
 	if (error)
 		return (error);
 
 	error = vn_make_ops(MNTTYPE_ZFS, zfs_xdvnodeops_template,
 	    &zfs_xdvnodeops);
 	if (error)
 		return (error);
 
 	error = vn_make_ops(MNTTYPE_ZFS, zfs_evnodeops_template,
 	    &zfs_evnodeops);
 	if (error)
 		return (error);
 
 	error = vn_make_ops(MNTTYPE_ZFS, zfs_sharevnodeops_template,
 	    &zfs_sharevnodeops);
 
 	return (error);
 }
 #endif	/* illumos */
 
 int
 zfs_create_share_dir(zfsvfs_t *zfsvfs, dmu_tx_t *tx)
 {
 	zfs_acl_ids_t acl_ids;
 	vattr_t vattr;
 	znode_t *sharezp;
 	znode_t *zp;
 	int error;
 
 	vattr.va_mask = AT_MODE|AT_UID|AT_GID|AT_TYPE;
 	vattr.va_type = VDIR;
 	vattr.va_mode = S_IFDIR|0555;
 	vattr.va_uid = crgetuid(kcred);
 	vattr.va_gid = crgetgid(kcred);
 
 	sharezp = kmem_cache_alloc(znode_cache, KM_SLEEP);
 	ASSERT(!POINTER_IS_VALID(sharezp->z_zfsvfs));
 	sharezp->z_moved = 0;
 	sharezp->z_unlinked = 0;
 	sharezp->z_atime_dirty = 0;
 	sharezp->z_zfsvfs = zfsvfs;
 	sharezp->z_is_sa = zfsvfs->z_use_sa;
 
 	VERIFY(0 == zfs_acl_ids_create(sharezp, IS_ROOT_NODE, &vattr,
 	    kcred, NULL, &acl_ids));
 	zfs_mknode(sharezp, &vattr, tx, kcred, IS_ROOT_NODE, &zp, &acl_ids);
 	ASSERT3P(zp, ==, sharezp);
 	POINTER_INVALIDATE(&sharezp->z_zfsvfs);
 	error = zap_add(zfsvfs->z_os, MASTER_NODE_OBJ,
 	    ZFS_SHARES_DIR, 8, 1, &sharezp->z_id, tx);
 	zfsvfs->z_shares_dir = sharezp->z_id;
 
 	zfs_acl_ids_free(&acl_ids);
 	sa_handle_destroy(sharezp->z_sa_hdl);
 	kmem_cache_free(znode_cache, sharezp);
 
 	return (error);
 }
 
 /*
  * define a couple of values we need available
  * for both 64 and 32 bit environments.
  */
 #ifndef NBITSMINOR64
 #define	NBITSMINOR64	32
 #endif
 #ifndef MAXMAJ64
 #define	MAXMAJ64	0xffffffffUL
 #endif
 #ifndef	MAXMIN64
 #define	MAXMIN64	0xffffffffUL
 #endif
 
 /*
  * Create special expldev for ZFS private use.
  * Can't use standard expldev since it doesn't do
  * what we want.  The standard expldev() takes a
  * dev32_t in LP64 and expands it to a long dev_t.
  * We need an interface that takes a dev32_t in ILP32
  * and expands it to a long dev_t.
  */
 static uint64_t
 zfs_expldev(dev_t dev)
 {
 	return (((uint64_t)major(dev) << NBITSMINOR64) | minor(dev));
 }
 /*
  * Special cmpldev for ZFS private use.
  * Can't use standard cmpldev since it takes
  * a long dev_t and compresses it to dev32_t in
  * LP64.  We need to do a compaction of a long dev_t
  * to a dev32_t in ILP32.
  */
 dev_t
 zfs_cmpldev(uint64_t dev)
 {
 	return (makedev((dev >> NBITSMINOR64), (dev & MAXMIN64)));
 }
 
 static void
 zfs_znode_sa_init(zfsvfs_t *zfsvfs, znode_t *zp,
     dmu_buf_t *db, dmu_object_type_t obj_type, sa_handle_t *sa_hdl)
 {
 	ASSERT(!POINTER_IS_VALID(zp->z_zfsvfs) || (zfsvfs == zp->z_zfsvfs));
 	ASSERT(MUTEX_HELD(ZFS_OBJ_MUTEX(zfsvfs, zp->z_id)));
 
-	mutex_enter(&zp->z_lock);
-
 	ASSERT(zp->z_sa_hdl == NULL);
 	ASSERT(zp->z_acl_cached == NULL);
 	if (sa_hdl == NULL) {
 		VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, zp,
 		    SA_HDL_SHARED, &zp->z_sa_hdl));
 	} else {
 		zp->z_sa_hdl = sa_hdl;
 		sa_set_userp(sa_hdl, zp);
 	}
 
 	zp->z_is_sa = (obj_type == DMU_OT_SA) ? B_TRUE : B_FALSE;
 
 	/*
 	 * Slap on VROOT if we are the root znode unless we are the root
 	 * node of a snapshot mounted under .zfs.
 	 */
 	if (zp->z_id == zfsvfs->z_root && zfsvfs->z_parent == zfsvfs)
 		ZTOV(zp)->v_flag |= VROOT;
 
-	mutex_exit(&zp->z_lock);
 	vn_exists(ZTOV(zp));
 }
 
 void
 zfs_znode_dmu_fini(znode_t *zp)
 {
 	ASSERT(MUTEX_HELD(ZFS_OBJ_MUTEX(zp->z_zfsvfs, zp->z_id)) ||
 	    zp->z_unlinked ||
 	    RW_WRITE_HELD(&zp->z_zfsvfs->z_teardown_inactive_lock));
 
 	sa_handle_destroy(zp->z_sa_hdl);
 	zp->z_sa_hdl = NULL;
 }
 
 static void
 zfs_vnode_forget(vnode_t *vp)
 {
 
 	/* copied from insmntque_stddtr */
 	vp->v_data = NULL;
 	vp->v_op = &dead_vnodeops;
 	vgone(vp);
 	vput(vp);
 }
 
 /*
  * Construct a new znode/vnode and intialize.
  *
  * This does not do a call to dmu_set_user() that is
  * up to the caller to do, in case you don't want to
  * return the znode
  */
 static znode_t *
 zfs_znode_alloc(zfsvfs_t *zfsvfs, dmu_buf_t *db, int blksz,
     dmu_object_type_t obj_type, sa_handle_t *hdl)
 {
 	znode_t	*zp;
 	vnode_t *vp;
 	uint64_t mode;
 	uint64_t parent;
 	sa_bulk_attr_t bulk[9];
 	int count = 0;
 	int error;
 
 	zp = kmem_cache_alloc(znode_cache, KM_SLEEP);
 
 	KASSERT(curthread->td_vp_reserv > 0,
 	    ("zfs_znode_alloc: getnewvnode without any vnodes reserved"));
 	error = getnewvnode("zfs", zfsvfs->z_parent->z_vfs, &zfs_vnodeops, &vp);
 	if (error != 0) {
 		kmem_cache_free(znode_cache, zp);
 		return (NULL);
 	}
 	zp->z_vnode = vp;
 	vp->v_data = zp;
 
-	ASSERT(zp->z_dirlocks == NULL);
 	ASSERT(!POINTER_IS_VALID(zp->z_zfsvfs));
 	zp->z_moved = 0;
 
 	/*
 	 * Defer setting z_zfsvfs until the znode is ready to be a candidate for
 	 * the zfs_znode_move() callback.
 	 */
 	zp->z_sa_hdl = NULL;
 	zp->z_unlinked = 0;
 	zp->z_atime_dirty = 0;
 	zp->z_mapcnt = 0;
 	zp->z_id = db->db_object;
 	zp->z_blksz = blksz;
 	zp->z_seq = 0x7A4653;
 	zp->z_sync_cnt = 0;
 
 	vp = ZTOV(zp);
 
 	zfs_znode_sa_init(zfsvfs, zp, db, obj_type, hdl);
 
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zfsvfs), NULL, &mode, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GEN(zfsvfs), NULL, &zp->z_gen, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_SIZE(zfsvfs), NULL,
 	    &zp->z_size, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zfsvfs), NULL,
 	    &zp->z_links, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL,
 	    &zp->z_pflags, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_PARENT(zfsvfs), NULL, &parent, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ATIME(zfsvfs), NULL,
 	    &zp->z_atime, 16);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_UID(zfsvfs), NULL,
 	    &zp->z_uid, 8);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GID(zfsvfs), NULL,
 	    &zp->z_gid, 8);
 
 	if (sa_bulk_lookup(zp->z_sa_hdl, bulk, count) != 0 || zp->z_gen == 0) {
 		if (hdl == NULL)
 			sa_handle_destroy(zp->z_sa_hdl);
 		zfs_vnode_forget(vp);
 		zp->z_vnode = NULL;
 		kmem_cache_free(znode_cache, zp);
 		return (NULL);
 	}
 
 	zp->z_mode = mode;
 
 	vp->v_type = IFTOVT((mode_t)mode);
 
 	switch (vp->v_type) {
 	case VDIR:
 		zp->z_zn_prefetch = B_TRUE; /* z_prefetch default is enabled */
 		break;
 #ifdef illumos
 	case VBLK:
 	case VCHR:
 		{
 			uint64_t rdev;
 			VERIFY(sa_lookup(zp->z_sa_hdl, SA_ZPL_RDEV(zfsvfs),
 			    &rdev, sizeof (rdev)) == 0);
 
 			vp->v_rdev = zfs_cmpldev(rdev);
 		}
 		break;
 #endif
 	case VFIFO:
 #ifdef illumos
 	case VSOCK:
 	case VDOOR:
 #endif
 		vp->v_op = &zfs_fifoops;
 		break;
 	case VREG:
 		if (parent == zfsvfs->z_shares_dir) {
 			ASSERT(zp->z_uid == 0 && zp->z_gid == 0);
 			vp->v_op = &zfs_shareops;
 		}
 		break;
 #ifdef illumos
 	case VLNK:
 		vn_setops(vp, zfs_symvnodeops);
 		break;
 	default:
 		vn_setops(vp, zfs_evnodeops);
 		break;
 #endif
 	}
 
 	mutex_enter(&zfsvfs->z_znodes_lock);
 	list_insert_tail(&zfsvfs->z_all_znodes, zp);
 	membar_producer();
 	/*
 	 * Everything else must be valid before assigning z_zfsvfs makes the
 	 * znode eligible for zfs_znode_move().
 	 */
 	zp->z_zfsvfs = zfsvfs;
 	mutex_exit(&zfsvfs->z_znodes_lock);
 
 	/*
 	 * Acquire vnode lock before making it available to the world.
 	 */
+#ifdef DIAGNOSTIC
+	vop_lock1_t *orig_lock = vp->v_op->vop_lock1;
+	vp->v_op->vop_lock1 = vop_stdlock;
 	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
+	vp->v_op->vop_lock1 = orig_lock;
+#else
+	vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
+#endif
 	VN_LOCK_AREC(vp);
 	if (vp->v_type != VFIFO)
 		VN_LOCK_ASHARE(vp);
 
 #ifdef illumos
 	VFS_HOLD(zfsvfs->z_vfs);
 #endif
 	return (zp);
 }
 
 static uint64_t empty_xattr;
 static uint64_t pad[4];
 static zfs_acl_phys_t acl_phys;
 /*
  * Create a new DMU object to hold a zfs znode.
  *
  *	IN:	dzp	- parent directory for new znode
  *		vap	- file attributes for new znode
  *		tx	- dmu transaction id for zap operations
  *		cr	- credentials of caller
  *		flag	- flags:
  *			  IS_ROOT_NODE	- new object will be root
  *			  IS_XATTR	- new object is an attribute
  *		bonuslen - length of bonus buffer
  *		setaclp  - File/Dir initial ACL
  *		fuidp	 - Tracks fuid allocation.
  *
  *	OUT:	zpp	- allocated znode
  *
  */
 void
 zfs_mknode(znode_t *dzp, vattr_t *vap, dmu_tx_t *tx, cred_t *cr,
     uint_t flag, znode_t **zpp, zfs_acl_ids_t *acl_ids)
 {
 	uint64_t	crtime[2], atime[2], mtime[2], ctime[2];
 	uint64_t	mode, size, links, parent, pflags;
 	uint64_t	dzp_pflags = 0;
 	uint64_t	rdev = 0;
 	zfsvfs_t	*zfsvfs = dzp->z_zfsvfs;
 	dmu_buf_t	*db;
 	timestruc_t	now;
 	uint64_t	gen, obj;
 	int		err;
 	int		bonuslen;
 	sa_handle_t	*sa_hdl;
 	dmu_object_type_t obj_type;
 	sa_bulk_attr_t	sa_attrs[ZPL_END];
 	int		cnt = 0;
 	zfs_acl_locator_cb_t locate = { 0 };
 
 	ASSERT(vap && (vap->va_mask & (AT_TYPE|AT_MODE)) == (AT_TYPE|AT_MODE));
 
 	if (zfsvfs->z_replay) {
 		obj = vap->va_nodeid;
 		now = vap->va_ctime;		/* see zfs_replay_create() */
 		gen = vap->va_nblocks;		/* ditto */
 	} else {
 		obj = 0;
 		vfs_timestamp(&now);
 		gen = dmu_tx_get_txg(tx);
 	}
 
 	obj_type = zfsvfs->z_use_sa ? DMU_OT_SA : DMU_OT_ZNODE;
 	bonuslen = (obj_type == DMU_OT_SA) ?
 	    DN_MAX_BONUSLEN : ZFS_OLD_ZNODE_PHYS_SIZE;
 
 	/*
 	 * Create a new DMU object.
 	 */
 	/*
 	 * There's currently no mechanism for pre-reading the blocks that will
 	 * be needed to allocate a new object, so we accept the small chance
 	 * that there will be an i/o error and we will fail one of the
 	 * assertions below.
 	 */
 	if (vap->va_type == VDIR) {
 		if (zfsvfs->z_replay) {
 			VERIFY0(zap_create_claim_norm(zfsvfs->z_os, obj,
 			    zfsvfs->z_norm, DMU_OT_DIRECTORY_CONTENTS,
 			    obj_type, bonuslen, tx));
 		} else {
 			obj = zap_create_norm(zfsvfs->z_os,
 			    zfsvfs->z_norm, DMU_OT_DIRECTORY_CONTENTS,
 			    obj_type, bonuslen, tx);
 		}
 	} else {
 		if (zfsvfs->z_replay) {
 			VERIFY0(dmu_object_claim(zfsvfs->z_os, obj,
 			    DMU_OT_PLAIN_FILE_CONTENTS, 0,
 			    obj_type, bonuslen, tx));
 		} else {
 			obj = dmu_object_alloc(zfsvfs->z_os,
 			    DMU_OT_PLAIN_FILE_CONTENTS, 0,
 			    obj_type, bonuslen, tx);
 		}
 	}
 
 	ZFS_OBJ_HOLD_ENTER(zfsvfs, obj);
 	VERIFY(0 == sa_buf_hold(zfsvfs->z_os, obj, NULL, &db));
 
 	/*
 	 * If this is the root, fix up the half-initialized parent pointer
 	 * to reference the just-allocated physical data area.
 	 */
 	if (flag & IS_ROOT_NODE) {
 		dzp->z_id = obj;
 	} else {
 		dzp_pflags = dzp->z_pflags;
 	}
 
 	/*
 	 * If parent is an xattr, so am I.
 	 */
 	if (dzp_pflags & ZFS_XATTR) {
 		flag |= IS_XATTR;
 	}
 
 	if (zfsvfs->z_use_fuids)
 		pflags = ZFS_ARCHIVE | ZFS_AV_MODIFIED;
 	else
 		pflags = 0;
 
 	if (vap->va_type == VDIR) {
 		size = 2;		/* contents ("." and "..") */
 		links = (flag & (IS_ROOT_NODE | IS_XATTR)) ? 2 : 1;
 	} else {
 		size = links = 0;
 	}
 
 	if (vap->va_type == VBLK || vap->va_type == VCHR) {
 		rdev = zfs_expldev(vap->va_rdev);
 	}
 
 	parent = dzp->z_id;
 	mode = acl_ids->z_mode;
 	if (flag & IS_XATTR)
 		pflags |= ZFS_XATTR;
 
 	/*
 	 * No execs denied will be deterimed when zfs_mode_compute() is called.
 	 */
 	pflags |= acl_ids->z_aclp->z_hints &
 	    (ZFS_ACL_TRIVIAL|ZFS_INHERIT_ACE|ZFS_ACL_AUTO_INHERIT|
 	    ZFS_ACL_DEFAULTED|ZFS_ACL_PROTECTED);
 
 	ZFS_TIME_ENCODE(&now, crtime);
 	ZFS_TIME_ENCODE(&now, ctime);
 
 	if (vap->va_mask & AT_ATIME) {
 		ZFS_TIME_ENCODE(&vap->va_atime, atime);
 	} else {
 		ZFS_TIME_ENCODE(&now, atime);
 	}
 
 	if (vap->va_mask & AT_MTIME) {
 		ZFS_TIME_ENCODE(&vap->va_mtime, mtime);
 	} else {
 		ZFS_TIME_ENCODE(&now, mtime);
 	}
 
 	/* Now add in all of the "SA" attributes */
 	VERIFY(0 == sa_handle_get_from_db(zfsvfs->z_os, db, NULL, SA_HDL_SHARED,
 	    &sa_hdl));
 
 	/*
 	 * Setup the array of attributes to be replaced/set on the new file
 	 *
 	 * order for  DMU_OT_ZNODE is critical since it needs to be constructed
 	 * in the old znode_phys_t format.  Don't change this ordering
 	 */
 
 	if (obj_type == DMU_OT_ZNODE) {
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_ATIME(zfsvfs),
 		    NULL, &atime, 16);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_MTIME(zfsvfs),
 		    NULL, &mtime, 16);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_CTIME(zfsvfs),
 		    NULL, &ctime, 16);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_CRTIME(zfsvfs),
 		    NULL, &crtime, 16);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_GEN(zfsvfs),
 		    NULL, &gen, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_MODE(zfsvfs),
 		    NULL, &mode, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_SIZE(zfsvfs),
 		    NULL, &size, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_PARENT(zfsvfs),
 		    NULL, &parent, 8);
 	} else {
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_MODE(zfsvfs),
 		    NULL, &mode, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_SIZE(zfsvfs),
 		    NULL, &size, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_GEN(zfsvfs),
 		    NULL, &gen, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_UID(zfsvfs), NULL,
 		    &acl_ids->z_fuid, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_GID(zfsvfs), NULL,
 		    &acl_ids->z_fgid, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_PARENT(zfsvfs),
 		    NULL, &parent, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_FLAGS(zfsvfs),
 		    NULL, &pflags, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_ATIME(zfsvfs),
 		    NULL, &atime, 16);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_MTIME(zfsvfs),
 		    NULL, &mtime, 16);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_CTIME(zfsvfs),
 		    NULL, &ctime, 16);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_CRTIME(zfsvfs),
 		    NULL, &crtime, 16);
 	}
 
 	SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_LINKS(zfsvfs), NULL, &links, 8);
 
 	if (obj_type == DMU_OT_ZNODE) {
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_XATTR(zfsvfs), NULL,
 		    &empty_xattr, 8);
 	}
 	if (obj_type == DMU_OT_ZNODE ||
 	    (vap->va_type == VBLK || vap->va_type == VCHR)) {
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_RDEV(zfsvfs),
 		    NULL, &rdev, 8);
 
 	}
 	if (obj_type == DMU_OT_ZNODE) {
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_FLAGS(zfsvfs),
 		    NULL, &pflags, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_UID(zfsvfs), NULL,
 		    &acl_ids->z_fuid, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_GID(zfsvfs), NULL,
 		    &acl_ids->z_fgid, 8);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_PAD(zfsvfs), NULL, pad,
 		    sizeof (uint64_t) * 4);
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_ZNODE_ACL(zfsvfs), NULL,
 		    &acl_phys, sizeof (zfs_acl_phys_t));
 	} else if (acl_ids->z_aclp->z_version >= ZFS_ACL_VERSION_FUID) {
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_DACL_COUNT(zfsvfs), NULL,
 		    &acl_ids->z_aclp->z_acl_count, 8);
 		locate.cb_aclp = acl_ids->z_aclp;
 		SA_ADD_BULK_ATTR(sa_attrs, cnt, SA_ZPL_DACL_ACES(zfsvfs),
 		    zfs_acl_data_locator, &locate,
 		    acl_ids->z_aclp->z_acl_bytes);
 		mode = zfs_mode_compute(mode, acl_ids->z_aclp, &pflags,
 		    acl_ids->z_fuid, acl_ids->z_fgid);
 	}
 
 	VERIFY(sa_replace_all_by_template(sa_hdl, sa_attrs, cnt, tx) == 0);
 
 	if (!(flag & IS_ROOT_NODE)) {
 		*zpp = zfs_znode_alloc(zfsvfs, db, 0, obj_type, sa_hdl);
 		ASSERT(*zpp != NULL);
 	} else {
 		/*
 		 * If we are creating the root node, the "parent" we
 		 * passed in is the znode for the root.
 		 */
 		*zpp = dzp;
 
 		(*zpp)->z_sa_hdl = sa_hdl;
 	}
 
 	(*zpp)->z_pflags = pflags;
 	(*zpp)->z_mode = mode;
 
 	if (vap->va_mask & AT_XVATTR)
 		zfs_xvattr_set(*zpp, (xvattr_t *)vap, tx);
 
 	if (obj_type == DMU_OT_ZNODE ||
 	    acl_ids->z_aclp->z_version < ZFS_ACL_VERSION_FUID) {
 		VERIFY0(zfs_aclset_common(*zpp, acl_ids->z_aclp, cr, tx));
 	}
 	if (!(flag & IS_ROOT_NODE)) {
 		vnode_t *vp;
 
 		vp = ZTOV(*zpp);
 		vp->v_vflag |= VV_FORCEINSMQ;
 		err = insmntque(vp, zfsvfs->z_vfs);
 		vp->v_vflag &= ~VV_FORCEINSMQ;
 		KASSERT(err == 0, ("insmntque() failed: error %d", err));
 	}
 	ZFS_OBJ_HOLD_EXIT(zfsvfs, obj);
 }
 
 /*
  * Update in-core attributes.  It is assumed the caller will be doing an
  * sa_bulk_update to push the changes out.
  */
 void
 zfs_xvattr_set(znode_t *zp, xvattr_t *xvap, dmu_tx_t *tx)
 {
 	xoptattr_t *xoap;
 
 	xoap = xva_getxoptattr(xvap);
 	ASSERT(xoap);
 
 	if (XVA_ISSET_REQ(xvap, XAT_CREATETIME)) {
 		uint64_t times[2];
 		ZFS_TIME_ENCODE(&xoap->xoa_createtime, times);
 		(void) sa_update(zp->z_sa_hdl, SA_ZPL_CRTIME(zp->z_zfsvfs),
 		    &times, sizeof (times), tx);
 		XVA_SET_RTN(xvap, XAT_CREATETIME);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_READONLY)) {
 		ZFS_ATTR_SET(zp, ZFS_READONLY, xoap->xoa_readonly,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_READONLY);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_HIDDEN)) {
 		ZFS_ATTR_SET(zp, ZFS_HIDDEN, xoap->xoa_hidden,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_HIDDEN);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_SYSTEM)) {
 		ZFS_ATTR_SET(zp, ZFS_SYSTEM, xoap->xoa_system,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_SYSTEM);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_ARCHIVE)) {
 		ZFS_ATTR_SET(zp, ZFS_ARCHIVE, xoap->xoa_archive,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_ARCHIVE);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_IMMUTABLE)) {
 		ZFS_ATTR_SET(zp, ZFS_IMMUTABLE, xoap->xoa_immutable,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_IMMUTABLE);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_NOUNLINK)) {
 		ZFS_ATTR_SET(zp, ZFS_NOUNLINK, xoap->xoa_nounlink,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_NOUNLINK);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_APPENDONLY)) {
 		ZFS_ATTR_SET(zp, ZFS_APPENDONLY, xoap->xoa_appendonly,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_APPENDONLY);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_NODUMP)) {
 		ZFS_ATTR_SET(zp, ZFS_NODUMP, xoap->xoa_nodump,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_NODUMP);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_OPAQUE)) {
 		ZFS_ATTR_SET(zp, ZFS_OPAQUE, xoap->xoa_opaque,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_OPAQUE);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_AV_QUARANTINED)) {
 		ZFS_ATTR_SET(zp, ZFS_AV_QUARANTINED,
 		    xoap->xoa_av_quarantined, zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_AV_QUARANTINED);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_AV_MODIFIED)) {
 		ZFS_ATTR_SET(zp, ZFS_AV_MODIFIED, xoap->xoa_av_modified,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_AV_MODIFIED);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_AV_SCANSTAMP)) {
 		zfs_sa_set_scanstamp(zp, xvap, tx);
 		XVA_SET_RTN(xvap, XAT_AV_SCANSTAMP);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_REPARSE)) {
 		ZFS_ATTR_SET(zp, ZFS_REPARSE, xoap->xoa_reparse,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_REPARSE);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_OFFLINE)) {
 		ZFS_ATTR_SET(zp, ZFS_OFFLINE, xoap->xoa_offline,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_OFFLINE);
 	}
 	if (XVA_ISSET_REQ(xvap, XAT_SPARSE)) {
 		ZFS_ATTR_SET(zp, ZFS_SPARSE, xoap->xoa_sparse,
 		    zp->z_pflags, tx);
 		XVA_SET_RTN(xvap, XAT_SPARSE);
 	}
 }
 
 int
 zfs_zget(zfsvfs_t *zfsvfs, uint64_t obj_num, znode_t **zpp)
 {
 	dmu_object_info_t doi;
 	dmu_buf_t	*db;
 	znode_t		*zp;
 	vnode_t		*vp;
 	sa_handle_t	*hdl;
 	struct thread	*td;
 	int locked;
 	int err;
 
 	td = curthread;
 	getnewvnode_reserve(1);
 again:
 	*zpp = NULL;
 	ZFS_OBJ_HOLD_ENTER(zfsvfs, obj_num);
 
 	err = sa_buf_hold(zfsvfs->z_os, obj_num, NULL, &db);
 	if (err) {
 		ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
 		getnewvnode_drop_reserve();
 		return (err);
 	}
 
 	dmu_object_info_from_db(db, &doi);
 	if (doi.doi_bonus_type != DMU_OT_SA &&
 	    (doi.doi_bonus_type != DMU_OT_ZNODE ||
 	    (doi.doi_bonus_type == DMU_OT_ZNODE &&
 	    doi.doi_bonus_size < sizeof (znode_phys_t)))) {
 		sa_buf_rele(db, NULL);
 		ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
 #ifdef __FreeBSD__
 		getnewvnode_drop_reserve();
 #endif
 		return (SET_ERROR(EINVAL));
 	}
 
 	hdl = dmu_buf_get_user(db);
 	if (hdl != NULL) {
 		zp  = sa_get_userdata(hdl);
 
-
 		/*
 		 * Since "SA" does immediate eviction we
 		 * should never find a sa handle that doesn't
 		 * know about the znode.
 		 */
-
 		ASSERT3P(zp, !=, NULL);
-
-		mutex_enter(&zp->z_lock);
 		ASSERT3U(zp->z_id, ==, obj_num);
-		if (zp->z_unlinked) {
-			err = SET_ERROR(ENOENT);
-		} else {
-			vp = ZTOV(zp);
-			*zpp = zp;
-			err = 0;
-		}
+		*zpp = zp;
+		vp = ZTOV(zp);
 
 		/* Don't let the vnode disappear after ZFS_OBJ_HOLD_EXIT. */
-		if (err == 0)
-			VN_HOLD(vp);
+		VN_HOLD(vp);
 
-		mutex_exit(&zp->z_lock);
 		sa_buf_rele(db, NULL);
 		ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
 
-		if (err == 0) {
-			locked = VOP_ISLOCKED(vp);
-			VI_LOCK(vp);
-			if ((vp->v_iflag & VI_DOOMED) != 0 &&
-			    locked != LK_EXCLUSIVE) {
-				/*
-				 * The vnode is doomed and this thread doesn't
-				 * hold the exclusive lock on it, so the vnode
-				 * must be being reclaimed by another thread.
-				 * Otherwise the doomed vnode is being reclaimed
-				 * by this thread and zfs_zget is called from
-				 * ZIL internals.
-				 */
-				VI_UNLOCK(vp);
-				VN_RELE(vp);
-				goto again;
-			}
+		locked = VOP_ISLOCKED(vp);
+		VI_LOCK(vp);
+		if ((vp->v_iflag & VI_DOOMED) != 0 &&
+		    locked != LK_EXCLUSIVE) {
+			/*
+			 * The vnode is doomed and this thread doesn't
+			 * hold the exclusive lock on it, so the vnode
+			 * must be being reclaimed by another thread.
+			 * Otherwise the doomed vnode is being reclaimed
+			 * by this thread and zfs_zget is called from
+			 * ZIL internals.
+			 */
 			VI_UNLOCK(vp);
+
+			/*
+			 * XXX vrele() locks the vnode when the last reference
+			 * is dropped.  Although in this case the vnode is
+			 * doomed / dead and so no inactivation is required,
+			 * the vnode lock is still acquired.  That could result
+			 * in a LOR with z_teardown_lock if another thread holds
+			 * the vnode's lock and tries to take z_teardown_lock.
+			 * But that is only possible if the other thread peforms
+			 * a ZFS vnode operation on the vnode.  That either
+			 * should not happen if the vnode is dead or the thread
+			 * should also have a refrence to the vnode and thus
+			 * our reference is not last.
+			 */
+			VN_RELE(vp);
+			goto again;
 		}
+		VI_UNLOCK(vp);
 		getnewvnode_drop_reserve();
-		return (err);
+		return (0);
 	}
 
 	/*
 	 * Not found create new znode/vnode
 	 * but only if file exists.
 	 *
 	 * There is a small window where zfs_vget() could
 	 * find this object while a file create is still in
 	 * progress.  This is checked for in zfs_znode_alloc()
 	 *
 	 * if zfs_znode_alloc() fails it will drop the hold on the
 	 * bonus buffer.
 	 */
 	zp = zfs_znode_alloc(zfsvfs, db, doi.doi_data_block_size,
 	    doi.doi_bonus_type, NULL);
 	if (zp == NULL) {
 		err = SET_ERROR(ENOENT);
 	} else {
 		*zpp = zp;
 	}
 	if (err == 0) {
 		vnode_t *vp = ZTOV(zp);
 
 		err = insmntque(vp, zfsvfs->z_vfs);
 		if (err == 0) {
 			vp->v_hash = obj_num;
 			VOP_UNLOCK(vp, 0);
 		} else {
 			zp->z_vnode = NULL;
 			zfs_znode_dmu_fini(zp);
 			zfs_znode_free(zp);
 			*zpp = NULL;
 		}
 	}
 	ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
 	getnewvnode_drop_reserve();
 	return (err);
 }
 
 int
 zfs_rezget(znode_t *zp)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	dmu_object_info_t doi;
 	dmu_buf_t *db;
 	vnode_t *vp;
 	uint64_t obj_num = zp->z_id;
 	uint64_t mode, size;
 	sa_bulk_attr_t bulk[8];
 	int err;
 	int count = 0;
 	uint64_t gen;
 
 	ZFS_OBJ_HOLD_ENTER(zfsvfs, obj_num);
 
 	mutex_enter(&zp->z_acl_lock);
 	if (zp->z_acl_cached) {
 		zfs_acl_free(zp->z_acl_cached);
 		zp->z_acl_cached = NULL;
 	}
 
 	mutex_exit(&zp->z_acl_lock);
 	ASSERT(zp->z_sa_hdl == NULL);
 	err = sa_buf_hold(zfsvfs->z_os, obj_num, NULL, &db);
 	if (err) {
 		ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
 		return (err);
 	}
 
 	dmu_object_info_from_db(db, &doi);
 	if (doi.doi_bonus_type != DMU_OT_SA &&
 	    (doi.doi_bonus_type != DMU_OT_ZNODE ||
 	    (doi.doi_bonus_type == DMU_OT_ZNODE &&
 	    doi.doi_bonus_size < sizeof (znode_phys_t)))) {
 		sa_buf_rele(db, NULL);
 		ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
 		return (SET_ERROR(EINVAL));
 	}
 
 	zfs_znode_sa_init(zfsvfs, zp, db, doi.doi_bonus_type, NULL);
 	size = zp->z_size;
 
 	/* reload cached values */
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GEN(zfsvfs), NULL,
 	    &gen, sizeof (gen));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_SIZE(zfsvfs), NULL,
 	    &zp->z_size, sizeof (zp->z_size));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_LINKS(zfsvfs), NULL,
 	    &zp->z_links, sizeof (zp->z_links));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs), NULL,
 	    &zp->z_pflags, sizeof (zp->z_pflags));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_ATIME(zfsvfs), NULL,
 	    &zp->z_atime, sizeof (zp->z_atime));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_UID(zfsvfs), NULL,
 	    &zp->z_uid, sizeof (zp->z_uid));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_GID(zfsvfs), NULL,
 	    &zp->z_gid, sizeof (zp->z_gid));
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MODE(zfsvfs), NULL,
 	    &mode, sizeof (mode));
 
 	if (sa_bulk_lookup(zp->z_sa_hdl, bulk, count)) {
 		zfs_znode_dmu_fini(zp);
 		ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
 		return (SET_ERROR(EIO));
 	}
 
 	zp->z_mode = mode;
 
 	if (gen != zp->z_gen) {
 		zfs_znode_dmu_fini(zp);
 		ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
 		return (SET_ERROR(EIO));
 	}
 
 	/*
 	 * It is highly improbable but still quite possible that two
 	 * objects in different datasets are created with the same
 	 * object numbers and in transaction groups with the same
 	 * numbers.  znodes corresponding to those objects would
 	 * have the same z_id and z_gen, but their other attributes
 	 * may be different.
 	 * zfs recv -F may replace one of such objects with the other.
 	 * As a result file properties recorded in the replaced
 	 * object's vnode may no longer match the received object's
 	 * properties.  At present the only cached property is the
 	 * files type recorded in v_type.
 	 * So, handle this case by leaving the old vnode and znode
 	 * disassociated from the actual object.  A new vnode and a
 	 * znode will be created if the object is accessed
 	 * (e.g. via a look-up).  The old vnode and znode will be
 	 * recycled when the last vnode reference is dropped.
 	 */
 	vp = ZTOV(zp);
 	if (vp->v_type != IFTOVT((mode_t)zp->z_mode)) {
 		zfs_znode_dmu_fini(zp);
 		ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
 		return (EIO);
 	}
 
 	zp->z_unlinked = (zp->z_links == 0);
 	zp->z_blksz = doi.doi_data_block_size;
 	vn_pages_remove(vp, 0, 0);
 	if (zp->z_size != size)
 		vnode_pager_setsize(vp, zp->z_size);
 
 	ZFS_OBJ_HOLD_EXIT(zfsvfs, obj_num);
 
 	return (0);
 }
 
 void
 zfs_znode_delete(znode_t *zp, dmu_tx_t *tx)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	objset_t *os = zfsvfs->z_os;
 	uint64_t obj = zp->z_id;
 	uint64_t acl_obj = zfs_external_acl(zp);
 
 	ZFS_OBJ_HOLD_ENTER(zfsvfs, obj);
 	if (acl_obj) {
 		VERIFY(!zp->z_is_sa);
 		VERIFY(0 == dmu_object_free(os, acl_obj, tx));
 	}
 	VERIFY(0 == dmu_object_free(os, obj, tx));
 	zfs_znode_dmu_fini(zp);
 	ZFS_OBJ_HOLD_EXIT(zfsvfs, obj);
 	zfs_znode_free(zp);
 }
 
 void
 zfs_zinactive(znode_t *zp)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	uint64_t z_id = zp->z_id;
 
 	ASSERT(zp->z_sa_hdl);
 
 	/*
 	 * Don't allow a zfs_zget() while were trying to release this znode
 	 */
 	ZFS_OBJ_HOLD_ENTER(zfsvfs, z_id);
 
-	mutex_enter(&zp->z_lock);
-
 	/*
 	 * If this was the last reference to a file with no links,
 	 * remove the file from the file system.
 	 */
 	if (zp->z_unlinked) {
-		mutex_exit(&zp->z_lock);
 		ZFS_OBJ_HOLD_EXIT(zfsvfs, z_id);
 		zfs_rmnode(zp);
 		return;
 	}
 
-	mutex_exit(&zp->z_lock);
 	zfs_znode_dmu_fini(zp);
 	ZFS_OBJ_HOLD_EXIT(zfsvfs, z_id);
 	zfs_znode_free(zp);
 }
 
 void
 zfs_znode_free(znode_t *zp)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 
 	ASSERT(zp->z_sa_hdl == NULL);
 	zp->z_vnode = NULL;
 	mutex_enter(&zfsvfs->z_znodes_lock);
 	POINTER_INVALIDATE(&zp->z_zfsvfs);
 	list_remove(&zfsvfs->z_all_znodes, zp);
 	mutex_exit(&zfsvfs->z_znodes_lock);
 
 	if (zp->z_acl_cached) {
 		zfs_acl_free(zp->z_acl_cached);
 		zp->z_acl_cached = NULL;
 	}
 
 	kmem_cache_free(znode_cache, zp);
 
 #ifdef illumos
 	VFS_RELE(zfsvfs->z_vfs);
 #endif
 }
 
 void
 zfs_tstamp_update_setup(znode_t *zp, uint_t flag, uint64_t mtime[2],
     uint64_t ctime[2], boolean_t have_tx)
 {
 	timestruc_t	now;
 
 	vfs_timestamp(&now);
 
 	if (have_tx) {	/* will sa_bulk_update happen really soon? */
 		zp->z_atime_dirty = 0;
 		zp->z_seq++;
 	} else {
 		zp->z_atime_dirty = 1;
 	}
 
 	if (flag & AT_ATIME) {
 		ZFS_TIME_ENCODE(&now, zp->z_atime);
 	}
 
 	if (flag & AT_MTIME) {
 		ZFS_TIME_ENCODE(&now, mtime);
 		if (zp->z_zfsvfs->z_use_fuids) {
 			zp->z_pflags |= (ZFS_ARCHIVE |
 			    ZFS_AV_MODIFIED);
 		}
 	}
 
 	if (flag & AT_CTIME) {
 		ZFS_TIME_ENCODE(&now, ctime);
 		if (zp->z_zfsvfs->z_use_fuids)
 			zp->z_pflags |= ZFS_ARCHIVE;
 	}
 }
 
 /*
  * Grow the block size for a file.
  *
  *	IN:	zp	- znode of file to free data in.
  *		size	- requested block size
  *		tx	- open transaction.
  *
  * NOTE: this function assumes that the znode is write locked.
  */
 void
 zfs_grow_blocksize(znode_t *zp, uint64_t size, dmu_tx_t *tx)
 {
 	int		error;
 	u_longlong_t	dummy;
 
 	if (size <= zp->z_blksz)
 		return;
 	/*
 	 * If the file size is already greater than the current blocksize,
 	 * we will not grow.  If there is more than one block in a file,
 	 * the blocksize cannot change.
 	 */
 	if (zp->z_blksz && zp->z_size > zp->z_blksz)
 		return;
 
 	error = dmu_object_set_blocksize(zp->z_zfsvfs->z_os, zp->z_id,
 	    size, 0, tx);
 
 	if (error == ENOTSUP)
 		return;
 	ASSERT0(error);
 
 	/* What blocksize did we actually get? */
 	dmu_object_size_from_db(sa_get_db(zp->z_sa_hdl), &zp->z_blksz, &dummy);
 }
 
 #ifdef illumos
 /*
  * This is a dummy interface used when pvn_vplist_dirty() should *not*
  * be calling back into the fs for a putpage().  E.g.: when truncating
  * a file, the pages being "thrown away* don't need to be written out.
  */
 /* ARGSUSED */
 static int
 zfs_no_putpage(vnode_t *vp, page_t *pp, u_offset_t *offp, size_t *lenp,
     int flags, cred_t *cr)
 {
 	ASSERT(0);
 	return (0);
 }
 #endif
 
 /*
  * Increase the file length
  *
  *	IN:	zp	- znode of file to free data in.
  *		end	- new end-of-file
  *
  *	RETURN:	0 on success, error code on failure
  */
 static int
 zfs_extend(znode_t *zp, uint64_t end)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	dmu_tx_t *tx;
 	rl_t *rl;
 	uint64_t newblksz;
 	int error;
 
 	/*
 	 * We will change zp_size, lock the whole file.
 	 */
 	rl = zfs_range_lock(zp, 0, UINT64_MAX, RL_WRITER);
 
 	/*
 	 * Nothing to do if file already at desired length.
 	 */
 	if (end <= zp->z_size) {
 		zfs_range_unlock(rl);
 		return (0);
 	}
 	tx = dmu_tx_create(zfsvfs->z_os);
 	dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
 	zfs_sa_upgrade_txholds(tx, zp);
 	if (end > zp->z_blksz &&
 	    (!ISP2(zp->z_blksz) || zp->z_blksz < zfsvfs->z_max_blksz)) {
 		/*
 		 * We are growing the file past the current block size.
 		 */
 		if (zp->z_blksz > zp->z_zfsvfs->z_max_blksz) {
 			/*
 			 * File's blocksize is already larger than the
 			 * "recordsize" property.  Only let it grow to
 			 * the next power of 2.
 			 */
 			ASSERT(!ISP2(zp->z_blksz));
 			newblksz = MIN(end, 1 << highbit64(zp->z_blksz));
 		} else {
 			newblksz = MIN(end, zp->z_zfsvfs->z_max_blksz);
 		}
 		dmu_tx_hold_write(tx, zp->z_id, 0, newblksz);
 	} else {
 		newblksz = 0;
 	}
 
 	error = dmu_tx_assign(tx, TXG_WAIT);
 	if (error) {
 		dmu_tx_abort(tx);
 		zfs_range_unlock(rl);
 		return (error);
 	}
 
 	if (newblksz)
 		zfs_grow_blocksize(zp, newblksz, tx);
 
 	zp->z_size = end;
 
 	VERIFY(0 == sa_update(zp->z_sa_hdl, SA_ZPL_SIZE(zp->z_zfsvfs),
 	    &zp->z_size, sizeof (zp->z_size), tx));
 
 	vnode_pager_setsize(ZTOV(zp), end);
 
 	zfs_range_unlock(rl);
 
 	dmu_tx_commit(tx);
 
 	return (0);
 }
 
 /*
  * Free space in a file.
  *
  *	IN:	zp	- znode of file to free data in.
  *		off	- start of section to free.
  *		len	- length of section to free.
  *
  *	RETURN:	0 on success, error code on failure
  */
 static int
 zfs_free_range(znode_t *zp, uint64_t off, uint64_t len)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	rl_t *rl;
 	int error;
 
 	/*
 	 * Lock the range being freed.
 	 */
 	rl = zfs_range_lock(zp, off, len, RL_WRITER);
 
 	/*
 	 * Nothing to do if file already at desired length.
 	 */
 	if (off >= zp->z_size) {
 		zfs_range_unlock(rl);
 		return (0);
 	}
 
 	if (off + len > zp->z_size)
 		len = zp->z_size - off;
 
 	error = dmu_free_long_range(zfsvfs->z_os, zp->z_id, off, len);
 
 	if (error == 0) {
 		/*
 		 * In FreeBSD we cannot free block in the middle of a file,
 		 * but only at the end of a file, so this code path should
 		 * never happen.
 		 */
 		vnode_pager_setsize(ZTOV(zp), off);
 	}
 
 	zfs_range_unlock(rl);
 
 	return (error);
 }
 
 /*
  * Truncate a file
  *
  *	IN:	zp	- znode of file to free data in.
  *		end	- new end-of-file.
  *
  *	RETURN:	0 on success, error code on failure
  */
 static int
 zfs_trunc(znode_t *zp, uint64_t end)
 {
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	vnode_t *vp = ZTOV(zp);
 	dmu_tx_t *tx;
 	rl_t *rl;
 	int error;
 	sa_bulk_attr_t bulk[2];
 	int count = 0;
 
 	/*
 	 * We will change zp_size, lock the whole file.
 	 */
 	rl = zfs_range_lock(zp, 0, UINT64_MAX, RL_WRITER);
 
 	/*
 	 * Nothing to do if file already at desired length.
 	 */
 	if (end >= zp->z_size) {
 		zfs_range_unlock(rl);
 		return (0);
 	}
 
 	error = dmu_free_long_range(zfsvfs->z_os, zp->z_id, end,  -1);
 	if (error) {
 		zfs_range_unlock(rl);
 		return (error);
 	}
 	tx = dmu_tx_create(zfsvfs->z_os);
 	dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
 	zfs_sa_upgrade_txholds(tx, zp);
 	dmu_tx_mark_netfree(tx);
 	error = dmu_tx_assign(tx, TXG_WAIT);
 	if (error) {
 		dmu_tx_abort(tx);
 		zfs_range_unlock(rl);
 		return (error);
 	}
 
 	zp->z_size = end;
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_SIZE(zfsvfs),
 	    NULL, &zp->z_size, sizeof (zp->z_size));
 
 	if (end == 0) {
 		zp->z_pflags &= ~ZFS_SPARSE;
 		SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs),
 		    NULL, &zp->z_pflags, 8);
 	}
 	VERIFY(sa_bulk_update(zp->z_sa_hdl, bulk, count, tx) == 0);
 
 	dmu_tx_commit(tx);
 
 	/*
 	 * Clear any mapped pages in the truncated region.  This has to
 	 * happen outside of the transaction to avoid the possibility of
 	 * a deadlock with someone trying to push a page that we are
 	 * about to invalidate.
 	 */
 	vnode_pager_setsize(vp, end);
 
 	zfs_range_unlock(rl);
 
 	return (0);
 }
 
 /*
  * Free space in a file
  *
  *	IN:	zp	- znode of file to free data in.
  *		off	- start of range
  *		len	- end of range (0 => EOF)
  *		flag	- current file open mode flags.
  *		log	- TRUE if this action should be logged
  *
  *	RETURN:	0 on success, error code on failure
  */
 int
 zfs_freesp(znode_t *zp, uint64_t off, uint64_t len, int flag, boolean_t log)
 {
 	vnode_t *vp = ZTOV(zp);
 	dmu_tx_t *tx;
 	zfsvfs_t *zfsvfs = zp->z_zfsvfs;
 	zilog_t *zilog = zfsvfs->z_log;
 	uint64_t mode;
 	uint64_t mtime[2], ctime[2];
 	sa_bulk_attr_t bulk[3];
 	int count = 0;
 	int error;
 
 	if ((error = sa_lookup(zp->z_sa_hdl, SA_ZPL_MODE(zfsvfs), &mode,
 	    sizeof (mode))) != 0)
 		return (error);
 
 	if (off > zp->z_size) {
 		error =  zfs_extend(zp, off+len);
 		if (error == 0 && log)
 			goto log;
 		else
 			return (error);
 	}
 
 	/*
 	 * Check for any locks in the region to be freed.
 	 */
 
 	if (MANDLOCK(vp, (mode_t)mode)) {
 		uint64_t length = (len ? len : zp->z_size - off);
 		if (error = chklock(vp, FWRITE, off, length, flag, NULL))
 			return (error);
 	}
 
 	if (len == 0) {
 		error = zfs_trunc(zp, off);
 	} else {
 		if ((error = zfs_free_range(zp, off, len)) == 0 &&
 		    off + len > zp->z_size)
 			error = zfs_extend(zp, off+len);
 	}
 	if (error || !log)
 		return (error);
 log:
 	tx = dmu_tx_create(zfsvfs->z_os);
 	dmu_tx_hold_sa(tx, zp->z_sa_hdl, B_FALSE);
 	zfs_sa_upgrade_txholds(tx, zp);
 	error = dmu_tx_assign(tx, TXG_WAIT);
 	if (error) {
 		dmu_tx_abort(tx);
 		return (error);
 	}
 
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_MTIME(zfsvfs), NULL, mtime, 16);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_CTIME(zfsvfs), NULL, ctime, 16);
 	SA_ADD_BULK_ATTR(bulk, count, SA_ZPL_FLAGS(zfsvfs),
 	    NULL, &zp->z_pflags, 8);
 	zfs_tstamp_update_setup(zp, CONTENT_MODIFIED, mtime, ctime, B_TRUE);
 	error = sa_bulk_update(zp->z_sa_hdl, bulk, count, tx);
 	ASSERT(error == 0);
 
 	zfs_log_truncate(zilog, tx, TX_TRUNCATE, zp, off, len);
 
 	dmu_tx_commit(tx);
 	return (0);
 }
 
 void
 zfs_create_fs(objset_t *os, cred_t *cr, nvlist_t *zplprops, dmu_tx_t *tx)
 {
 	uint64_t	moid, obj, sa_obj, version;
 	uint64_t	sense = ZFS_CASE_SENSITIVE;
 	uint64_t	norm = 0;
 	nvpair_t	*elem;
 	int		error;
 	int		i;
 	znode_t		*rootzp = NULL;
 	zfsvfs_t	*zfsvfs;
 	vattr_t		vattr;
 	znode_t		*zp;
 	zfs_acl_ids_t	acl_ids;
 
 	/*
 	 * First attempt to create master node.
 	 */
 	/*
 	 * In an empty objset, there are no blocks to read and thus
 	 * there can be no i/o errors (which we assert below).
 	 */
 	moid = MASTER_NODE_OBJ;
 	error = zap_create_claim(os, moid, DMU_OT_MASTER_NODE,
 	    DMU_OT_NONE, 0, tx);
 	ASSERT(error == 0);
 
 	/*
 	 * Set starting attributes.
 	 */
 	version = zfs_zpl_version_map(spa_version(dmu_objset_spa(os)));
 	elem = NULL;
 	while ((elem = nvlist_next_nvpair(zplprops, elem)) != NULL) {
 		/* For the moment we expect all zpl props to be uint64_ts */
 		uint64_t val;
 		char *name;
 
 		ASSERT(nvpair_type(elem) == DATA_TYPE_UINT64);
 		VERIFY(nvpair_value_uint64(elem, &val) == 0);
 		name = nvpair_name(elem);
 		if (strcmp(name, zfs_prop_to_name(ZFS_PROP_VERSION)) == 0) {
 			if (val < version)
 				version = val;
 		} else {
 			error = zap_update(os, moid, name, 8, 1, &val, tx);
 		}
 		ASSERT(error == 0);
 		if (strcmp(name, zfs_prop_to_name(ZFS_PROP_NORMALIZE)) == 0)
 			norm = val;
 		else if (strcmp(name, zfs_prop_to_name(ZFS_PROP_CASE)) == 0)
 			sense = val;
 	}
 	ASSERT(version != 0);
 	error = zap_update(os, moid, ZPL_VERSION_STR, 8, 1, &version, tx);
 
 	/*
 	 * Create zap object used for SA attribute registration
 	 */
 
 	if (version >= ZPL_VERSION_SA) {
 		sa_obj = zap_create(os, DMU_OT_SA_MASTER_NODE,
 		    DMU_OT_NONE, 0, tx);
 		error = zap_add(os, moid, ZFS_SA_ATTRS, 8, 1, &sa_obj, tx);
 		ASSERT(error == 0);
 	} else {
 		sa_obj = 0;
 	}
 	/*
 	 * Create a delete queue.
 	 */
 	obj = zap_create(os, DMU_OT_UNLINKED_SET, DMU_OT_NONE, 0, tx);
 
 	error = zap_add(os, moid, ZFS_UNLINKED_SET, 8, 1, &obj, tx);
 	ASSERT(error == 0);
 
 	/*
 	 * Create root znode.  Create minimal znode/vnode/zfsvfs
 	 * to allow zfs_mknode to work.
 	 */
 	VATTR_NULL(&vattr);
 	vattr.va_mask = AT_MODE|AT_UID|AT_GID|AT_TYPE;
 	vattr.va_type = VDIR;
 	vattr.va_mode = S_IFDIR|0755;
 	vattr.va_uid = crgetuid(cr);
 	vattr.va_gid = crgetgid(cr);
 
 	zfsvfs = kmem_zalloc(sizeof (zfsvfs_t), KM_SLEEP);
 
 	rootzp = kmem_cache_alloc(znode_cache, KM_SLEEP);
 	ASSERT(!POINTER_IS_VALID(rootzp->z_zfsvfs));
 	rootzp->z_moved = 0;
 	rootzp->z_unlinked = 0;
 	rootzp->z_atime_dirty = 0;
 	rootzp->z_is_sa = USE_SA(version, os);
 
 	zfsvfs->z_os = os;
 	zfsvfs->z_parent = zfsvfs;
 	zfsvfs->z_version = version;
 	zfsvfs->z_use_fuids = USE_FUIDS(version, os);
 	zfsvfs->z_use_sa = USE_SA(version, os);
 	zfsvfs->z_norm = norm;
 
 	error = sa_setup(os, sa_obj, zfs_attr_table, ZPL_END,
 	    &zfsvfs->z_attr_table);
 
 	ASSERT(error == 0);
 
 	/*
 	 * Fold case on file systems that are always or sometimes case
 	 * insensitive.
 	 */
 	if (sense == ZFS_CASE_INSENSITIVE || sense == ZFS_CASE_MIXED)
 		zfsvfs->z_norm |= U8_TEXTPREP_TOUPPER;
 
 	mutex_init(&zfsvfs->z_znodes_lock, NULL, MUTEX_DEFAULT, NULL);
 	list_create(&zfsvfs->z_all_znodes, sizeof (znode_t),
 	    offsetof(znode_t, z_link_node));
 
 	for (i = 0; i != ZFS_OBJ_MTX_SZ; i++)
 		mutex_init(&zfsvfs->z_hold_mtx[i], NULL, MUTEX_DEFAULT, NULL);
 
 	rootzp->z_zfsvfs = zfsvfs;
 	VERIFY(0 == zfs_acl_ids_create(rootzp, IS_ROOT_NODE, &vattr,
 	    cr, NULL, &acl_ids));
 	zfs_mknode(rootzp, &vattr, tx, cr, IS_ROOT_NODE, &zp, &acl_ids);
 	ASSERT3P(zp, ==, rootzp);
 	error = zap_add(os, moid, ZFS_ROOT_OBJ, 8, 1, &rootzp->z_id, tx);
 	ASSERT(error == 0);
 	zfs_acl_ids_free(&acl_ids);
 	POINTER_INVALIDATE(&rootzp->z_zfsvfs);
 
 	sa_handle_destroy(rootzp->z_sa_hdl);
 	kmem_cache_free(znode_cache, rootzp);
 
 	/*
 	 * Create shares directory
 	 */
 
 	error = zfs_create_share_dir(zfsvfs, tx);
 
 	ASSERT(error == 0);
 
 	for (i = 0; i != ZFS_OBJ_MTX_SZ; i++)
 		mutex_destroy(&zfsvfs->z_hold_mtx[i]);
 	kmem_free(zfsvfs, sizeof (zfsvfs_t));
 }
 
 #endif /* _KERNEL */
 
 static int
 zfs_sa_setup(objset_t *osp, sa_attr_type_t **sa_table)
 {
 	uint64_t sa_obj = 0;
 	int error;
 
 	error = zap_lookup(osp, MASTER_NODE_OBJ, ZFS_SA_ATTRS, 8, 1, &sa_obj);
 	if (error != 0 && error != ENOENT)
 		return (error);
 
 	error = sa_setup(osp, sa_obj, zfs_attr_table, ZPL_END, sa_table);
 	return (error);
 }
 
 static int
 zfs_grab_sa_handle(objset_t *osp, uint64_t obj, sa_handle_t **hdlp,
     dmu_buf_t **db, void *tag)
 {
 	dmu_object_info_t doi;
 	int error;
 
 	if ((error = sa_buf_hold(osp, obj, tag, db)) != 0)
 		return (error);
 
 	dmu_object_info_from_db(*db, &doi);
 	if ((doi.doi_bonus_type != DMU_OT_SA &&
 	    doi.doi_bonus_type != DMU_OT_ZNODE) ||
 	    doi.doi_bonus_type == DMU_OT_ZNODE &&
 	    doi.doi_bonus_size < sizeof (znode_phys_t)) {
 		sa_buf_rele(*db, tag);
 		return (SET_ERROR(ENOTSUP));
 	}
 
 	error = sa_handle_get(osp, obj, NULL, SA_HDL_PRIVATE, hdlp);
 	if (error != 0) {
 		sa_buf_rele(*db, tag);
 		return (error);
 	}
 
 	return (0);
 }
 
 void
 zfs_release_sa_handle(sa_handle_t *hdl, dmu_buf_t *db, void *tag)
 {
 	sa_handle_destroy(hdl);
 	sa_buf_rele(db, tag);
 }
 
 /*
  * Given an object number, return its parent object number and whether
  * or not the object is an extended attribute directory.
  */
 static int
 zfs_obj_to_pobj(objset_t *osp, sa_handle_t *hdl, sa_attr_type_t *sa_table,
     uint64_t *pobjp, int *is_xattrdir)
 {
 	uint64_t parent;
 	uint64_t pflags;
 	uint64_t mode;
 	uint64_t parent_mode;
 	sa_bulk_attr_t bulk[3];
 	sa_handle_t *sa_hdl;
 	dmu_buf_t *sa_db;
 	int count = 0;
 	int error;
 
 	SA_ADD_BULK_ATTR(bulk, count, sa_table[ZPL_PARENT], NULL,
 	    &parent, sizeof (parent));
 	SA_ADD_BULK_ATTR(bulk, count, sa_table[ZPL_FLAGS], NULL,
 	    &pflags, sizeof (pflags));
 	SA_ADD_BULK_ATTR(bulk, count, sa_table[ZPL_MODE], NULL,
 	    &mode, sizeof (mode));
 
 	if ((error = sa_bulk_lookup(hdl, bulk, count)) != 0)
 		return (error);
 
 	/*
 	 * When a link is removed its parent pointer is not changed and will
 	 * be invalid.  There are two cases where a link is removed but the
 	 * file stays around, when it goes to the delete queue and when there
 	 * are additional links.
 	 */
 	error = zfs_grab_sa_handle(osp, parent, &sa_hdl, &sa_db, FTAG);
 	if (error != 0)
 		return (error);
 
 	error = sa_lookup(sa_hdl, ZPL_MODE, &parent_mode, sizeof (parent_mode));
 	zfs_release_sa_handle(sa_hdl, sa_db, FTAG);
 	if (error != 0)
 		return (error);
 
 	*is_xattrdir = ((pflags & ZFS_XATTR) != 0) && S_ISDIR(mode);
 
 	/*
 	 * Extended attributes can be applied to files, directories, etc.
 	 * Otherwise the parent must be a directory.
 	 */
 	if (!*is_xattrdir && !S_ISDIR(parent_mode))
 		return (SET_ERROR(EINVAL));
 
 	*pobjp = parent;
 
 	return (0);
 }
 
 /*
  * Given an object number, return some zpl level statistics
  */
 static int
 zfs_obj_to_stats_impl(sa_handle_t *hdl, sa_attr_type_t *sa_table,
     zfs_stat_t *sb)
 {
 	sa_bulk_attr_t bulk[4];
 	int count = 0;
 
 	SA_ADD_BULK_ATTR(bulk, count, sa_table[ZPL_MODE], NULL,
 	    &sb->zs_mode, sizeof (sb->zs_mode));
 	SA_ADD_BULK_ATTR(bulk, count, sa_table[ZPL_GEN], NULL,
 	    &sb->zs_gen, sizeof (sb->zs_gen));
 	SA_ADD_BULK_ATTR(bulk, count, sa_table[ZPL_LINKS], NULL,
 	    &sb->zs_links, sizeof (sb->zs_links));
 	SA_ADD_BULK_ATTR(bulk, count, sa_table[ZPL_CTIME], NULL,
 	    &sb->zs_ctime, sizeof (sb->zs_ctime));
 
 	return (sa_bulk_lookup(hdl, bulk, count));
 }
 
 static int
 zfs_obj_to_path_impl(objset_t *osp, uint64_t obj, sa_handle_t *hdl,
     sa_attr_type_t *sa_table, char *buf, int len)
 {
 	sa_handle_t *sa_hdl;
 	sa_handle_t *prevhdl = NULL;
 	dmu_buf_t *prevdb = NULL;
 	dmu_buf_t *sa_db = NULL;
 	char *path = buf + len - 1;
 	int error;
 
 	*path = '\0';
 	sa_hdl = hdl;
 
 	for (;;) {
 		uint64_t pobj;
 		char component[MAXNAMELEN + 2];
 		size_t complen;
 		int is_xattrdir;
 
 		if (prevdb)
 			zfs_release_sa_handle(prevhdl, prevdb, FTAG);
 
 		if ((error = zfs_obj_to_pobj(osp, sa_hdl, sa_table, &pobj,
 		    &is_xattrdir)) != 0)
 			break;
 
 		if (pobj == obj) {
 			if (path[0] != '/')
 				*--path = '/';
 			break;
 		}
 
 		component[0] = '/';
 		if (is_xattrdir) {
 			(void) sprintf(component + 1, "<xattrdir>");
 		} else {
 			error = zap_value_search(osp, pobj, obj,
 			    ZFS_DIRENT_OBJ(-1ULL), component + 1);
 			if (error != 0)
 				break;
 		}
 
 		complen = strlen(component);
 		path -= complen;
 		ASSERT(path >= buf);
 		bcopy(component, path, complen);
 		obj = pobj;
 
 		if (sa_hdl != hdl) {
 			prevhdl = sa_hdl;
 			prevdb = sa_db;
 		}
 		error = zfs_grab_sa_handle(osp, obj, &sa_hdl, &sa_db, FTAG);
 		if (error != 0) {
 			sa_hdl = prevhdl;
 			sa_db = prevdb;
 			break;
 		}
 	}
 
 	if (sa_hdl != NULL && sa_hdl != hdl) {
 		ASSERT(sa_db != NULL);
 		zfs_release_sa_handle(sa_hdl, sa_db, FTAG);
 	}
 
 	if (error == 0)
 		(void) memmove(buf, path, buf + len - path);
 
 	return (error);
 }
 
 int
 zfs_obj_to_path(objset_t *osp, uint64_t obj, char *buf, int len)
 {
 	sa_attr_type_t *sa_table;
 	sa_handle_t *hdl;
 	dmu_buf_t *db;
 	int error;
 
 	error = zfs_sa_setup(osp, &sa_table);
 	if (error != 0)
 		return (error);
 
 	error = zfs_grab_sa_handle(osp, obj, &hdl, &db, FTAG);
 	if (error != 0)
 		return (error);
 
 	error = zfs_obj_to_path_impl(osp, obj, hdl, sa_table, buf, len);
 
 	zfs_release_sa_handle(hdl, db, FTAG);
 	return (error);
 }
 
 int
 zfs_obj_to_stats(objset_t *osp, uint64_t obj, zfs_stat_t *sb,
     char *buf, int len)
 {
 	char *path = buf + len - 1;
 	sa_attr_type_t *sa_table;
 	sa_handle_t *hdl;
 	dmu_buf_t *db;
 	int error;
 
 	*path = '\0';
 
 	error = zfs_sa_setup(osp, &sa_table);
 	if (error != 0)
 		return (error);
 
 	error = zfs_grab_sa_handle(osp, obj, &hdl, &db, FTAG);
 	if (error != 0)
 		return (error);
 
 	error = zfs_obj_to_stats_impl(hdl, sa_table, sb);
 	if (error != 0) {
 		zfs_release_sa_handle(hdl, db, FTAG);
 		return (error);
 	}
 
 	error = zfs_obj_to_path_impl(osp, obj, hdl, sa_table, buf, len);
 
 	zfs_release_sa_handle(hdl, db, FTAG);
 	return (error);
 }
Index: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris
===================================================================
--- user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris	(revision 303775)

Property changes on: user/alc/PQ_LAUNDRY/sys/cddl/contrib/opensolaris
___________________________________________________________________
Modified: svn:mergeinfo
## -0,0 +0,1 ##
   Merged /head/sys/cddl/contrib/opensolaris:r303667-303774
Index: user/alc/PQ_LAUNDRY/sys/compat/freebsd32/freebsd32_syscall.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/compat/freebsd32/freebsd32_syscall.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/compat/freebsd32/freebsd32_syscall.h	(revision 303775)
@@ -1,460 +1,460 @@
 /*
  * System call numbers.
  *
  * DO NOT EDIT-- this file is automatically generated.
  * $FreeBSD$
  * created from FreeBSD: head/sys/compat/freebsd32/syscalls.master 303699 2016-08-03 06:33:04Z ed 
  */
 
 #define	FREEBSD32_SYS_syscall	0
 #define	FREEBSD32_SYS_exit	1
 #define	FREEBSD32_SYS_fork	2
 #define	FREEBSD32_SYS_read	3
 #define	FREEBSD32_SYS_write	4
 #define	FREEBSD32_SYS_open	5
 #define	FREEBSD32_SYS_close	6
 #define	FREEBSD32_SYS_freebsd32_wait4	7
 				/* 8 is obsolete old creat */
 #define	FREEBSD32_SYS_link	9
 #define	FREEBSD32_SYS_unlink	10
 				/* 11 is obsolete execv */
 #define	FREEBSD32_SYS_chdir	12
 #define	FREEBSD32_SYS_fchdir	13
 #define	FREEBSD32_SYS_mknod	14
 #define	FREEBSD32_SYS_chmod	15
 #define	FREEBSD32_SYS_chown	16
 #define	FREEBSD32_SYS_break	17
 				/* 18 is freebsd4 freebsd32_getfsstat */
 				/* 19 is old freebsd32_lseek */
 #define	FREEBSD32_SYS_getpid	20
 #define	FREEBSD32_SYS_mount	21
 #define	FREEBSD32_SYS_unmount	22
 #define	FREEBSD32_SYS_setuid	23
 #define	FREEBSD32_SYS_getuid	24
 #define	FREEBSD32_SYS_geteuid	25
 #define	FREEBSD32_SYS_ptrace	26
 #define	FREEBSD32_SYS_freebsd32_recvmsg	27
 #define	FREEBSD32_SYS_freebsd32_sendmsg	28
 #define	FREEBSD32_SYS_freebsd32_recvfrom	29
 #define	FREEBSD32_SYS_accept	30
 #define	FREEBSD32_SYS_getpeername	31
 #define	FREEBSD32_SYS_getsockname	32
 #define	FREEBSD32_SYS_access	33
 #define	FREEBSD32_SYS_chflags	34
 #define	FREEBSD32_SYS_fchflags	35
 #define	FREEBSD32_SYS_sync	36
 #define	FREEBSD32_SYS_kill	37
 				/* 38 is old freebsd32_stat */
 #define	FREEBSD32_SYS_getppid	39
 				/* 40 is old freebsd32_lstat */
 #define	FREEBSD32_SYS_dup	41
-				/* 42 is freebsd10 freebsd32_pipe */
+#define	FREEBSD32_SYS_freebsd10_freebsd32_pipe	42
 #define	FREEBSD32_SYS_getegid	43
 #define	FREEBSD32_SYS_profil	44
 #define	FREEBSD32_SYS_ktrace	45
 				/* 46 is old freebsd32_sigaction */
 #define	FREEBSD32_SYS_getgid	47
 				/* 48 is old freebsd32_sigprocmask */
 #define	FREEBSD32_SYS_getlogin	49
 #define	FREEBSD32_SYS_setlogin	50
 #define	FREEBSD32_SYS_acct	51
 				/* 52 is old freebsd32_sigpending */
 #define	FREEBSD32_SYS_freebsd32_sigaltstack	53
 #define	FREEBSD32_SYS_freebsd32_ioctl	54
 #define	FREEBSD32_SYS_reboot	55
 #define	FREEBSD32_SYS_revoke	56
 #define	FREEBSD32_SYS_symlink	57
 #define	FREEBSD32_SYS_readlink	58
 #define	FREEBSD32_SYS_freebsd32_execve	59
 #define	FREEBSD32_SYS_umask	60
 #define	FREEBSD32_SYS_chroot	61
 				/* 62 is old freebsd32_fstat */
 				/* 63 is obsolete ogetkerninfo */
 				/* 64 is old freebsd32_getpagesize */
 #define	FREEBSD32_SYS_msync	65
 #define	FREEBSD32_SYS_vfork	66
 				/* 67 is obsolete vread */
 				/* 68 is obsolete vwrite */
 #define	FREEBSD32_SYS_sbrk	69
 #define	FREEBSD32_SYS_sstk	70
 				/* 71 is old mmap */
 #define	FREEBSD32_SYS_vadvise	72
 #define	FREEBSD32_SYS_munmap	73
 #define	FREEBSD32_SYS_freebsd32_mprotect	74
 #define	FREEBSD32_SYS_madvise	75
 				/* 76 is obsolete vhangup */
 				/* 77 is obsolete vlimit */
 #define	FREEBSD32_SYS_mincore	78
 #define	FREEBSD32_SYS_getgroups	79
 #define	FREEBSD32_SYS_setgroups	80
 #define	FREEBSD32_SYS_getpgrp	81
 #define	FREEBSD32_SYS_setpgid	82
 #define	FREEBSD32_SYS_freebsd32_setitimer	83
 				/* 84 is obsolete owait */
 #define	FREEBSD32_SYS_swapon	85
 #define	FREEBSD32_SYS_freebsd32_getitimer	86
 				/* 87 is obsolete ogethostname */
 				/* 88 is obsolete osethostname */
 #define	FREEBSD32_SYS_getdtablesize	89
 #define	FREEBSD32_SYS_dup2	90
 #define	FREEBSD32_SYS_freebsd32_fcntl	92
 #define	FREEBSD32_SYS_freebsd32_select	93
 #define	FREEBSD32_SYS_fsync	95
 #define	FREEBSD32_SYS_setpriority	96
 #define	FREEBSD32_SYS_socket	97
 #define	FREEBSD32_SYS_connect	98
 				/* 99 is obsolete oaccept */
 #define	FREEBSD32_SYS_getpriority	100
 				/* 101 is obsolete osend */
 				/* 102 is obsolete orecv */
 				/* 103 is old freebsd32_sigreturn */
 #define	FREEBSD32_SYS_bind	104
 #define	FREEBSD32_SYS_setsockopt	105
 #define	FREEBSD32_SYS_listen	106
 				/* 107 is obsolete vtimes */
 				/* 108 is old freebsd32_sigvec */
 				/* 109 is old freebsd32_sigblock */
 				/* 110 is old freebsd32_sigsetmask */
 				/* 111 is old freebsd32_sigsuspend */
 				/* 112 is old freebsd32_sigstack */
 				/* 113 is obsolete orecvmsg */
 				/* 114 is obsolete osendmsg */
 				/* 115 is obsolete vtrace */
 #define	FREEBSD32_SYS_freebsd32_gettimeofday	116
 #define	FREEBSD32_SYS_freebsd32_getrusage	117
 #define	FREEBSD32_SYS_getsockopt	118
 #define	FREEBSD32_SYS_freebsd32_readv	120
 #define	FREEBSD32_SYS_freebsd32_writev	121
 #define	FREEBSD32_SYS_freebsd32_settimeofday	122
 #define	FREEBSD32_SYS_fchown	123
 #define	FREEBSD32_SYS_fchmod	124
 				/* 125 is obsolete orecvfrom */
 #define	FREEBSD32_SYS_setreuid	126
 #define	FREEBSD32_SYS_setregid	127
 #define	FREEBSD32_SYS_rename	128
 				/* 129 is old truncate */
 				/* 130 is old ftruncate */
 #define	FREEBSD32_SYS_flock	131
 #define	FREEBSD32_SYS_mkfifo	132
 #define	FREEBSD32_SYS_sendto	133
 #define	FREEBSD32_SYS_shutdown	134
 #define	FREEBSD32_SYS_socketpair	135
 #define	FREEBSD32_SYS_mkdir	136
 #define	FREEBSD32_SYS_rmdir	137
 #define	FREEBSD32_SYS_freebsd32_utimes	138
 				/* 139 is obsolete 4.2 sigreturn */
 #define	FREEBSD32_SYS_freebsd32_adjtime	140
 				/* 141 is obsolete ogetpeername */
 				/* 142 is obsolete ogethostid */
 				/* 143 is obsolete sethostid */
 				/* 144 is obsolete getrlimit */
 				/* 145 is obsolete setrlimit */
 				/* 146 is obsolete killpg */
 #define	FREEBSD32_SYS_setsid	147
 #define	FREEBSD32_SYS_quotactl	148
 				/* 149 is obsolete oquota */
 				/* 150 is obsolete ogetsockname */
 				/* 156 is old freebsd32_getdirentries */
 				/* 157 is freebsd4 freebsd32_statfs */
 				/* 158 is freebsd4 freebsd32_fstatfs */
 #define	FREEBSD32_SYS_getfh	161
 				/* 162 is obsolete getdomainname */
 				/* 163 is obsolete setdomainname */
 				/* 164 is obsolete uname */
 #define	FREEBSD32_SYS_freebsd32_sysarch	165
 #define	FREEBSD32_SYS_rtprio	166
 #define	FREEBSD32_SYS_freebsd32_semsys	169
 #define	FREEBSD32_SYS_freebsd32_msgsys	170
 #define	FREEBSD32_SYS_freebsd32_shmsys	171
 				/* 173 is freebsd6 freebsd32_pread */
 				/* 174 is freebsd6 freebsd32_pwrite */
 #define	FREEBSD32_SYS_ntp_adjtime	176
 #define	FREEBSD32_SYS_setgid	181
 #define	FREEBSD32_SYS_setegid	182
 #define	FREEBSD32_SYS_seteuid	183
 #define	FREEBSD32_SYS_freebsd32_stat	188
 #define	FREEBSD32_SYS_freebsd32_fstat	189
 #define	FREEBSD32_SYS_freebsd32_lstat	190
 #define	FREEBSD32_SYS_pathconf	191
 #define	FREEBSD32_SYS_fpathconf	192
 #define	FREEBSD32_SYS_getrlimit	194
 #define	FREEBSD32_SYS_setrlimit	195
 #define	FREEBSD32_SYS_freebsd32_getdirentries	196
 				/* 197 is freebsd6 freebsd32_mmap */
 #define	FREEBSD32_SYS___syscall	198
 				/* 199 is freebsd6 freebsd32_lseek */
 				/* 200 is freebsd6 freebsd32_truncate */
 				/* 201 is freebsd6 freebsd32_ftruncate */
 #define	FREEBSD32_SYS_freebsd32_sysctl	202
 #define	FREEBSD32_SYS_mlock	203
 #define	FREEBSD32_SYS_munlock	204
 #define	FREEBSD32_SYS_undelete	205
 #define	FREEBSD32_SYS_freebsd32_futimes	206
 #define	FREEBSD32_SYS_getpgid	207
 #define	FREEBSD32_SYS_poll	209
 #define	FREEBSD32_SYS_freebsd7_freebsd32_semctl	220
 #define	FREEBSD32_SYS_semget	221
 #define	FREEBSD32_SYS_semop	222
 #define	FREEBSD32_SYS_freebsd7_freebsd32_msgctl	224
 #define	FREEBSD32_SYS_msgget	225
 #define	FREEBSD32_SYS_freebsd32_msgsnd	226
 #define	FREEBSD32_SYS_freebsd32_msgrcv	227
 #define	FREEBSD32_SYS_shmat	228
 #define	FREEBSD32_SYS_freebsd7_freebsd32_shmctl	229
 #define	FREEBSD32_SYS_shmdt	230
 #define	FREEBSD32_SYS_shmget	231
 #define	FREEBSD32_SYS_freebsd32_clock_gettime	232
 #define	FREEBSD32_SYS_freebsd32_clock_settime	233
 #define	FREEBSD32_SYS_freebsd32_clock_getres	234
 #define	FREEBSD32_SYS_freebsd32_ktimer_create	235
 #define	FREEBSD32_SYS_ktimer_delete	236
 #define	FREEBSD32_SYS_freebsd32_ktimer_settime	237
 #define	FREEBSD32_SYS_freebsd32_ktimer_gettime	238
 #define	FREEBSD32_SYS_ktimer_getoverrun	239
 #define	FREEBSD32_SYS_freebsd32_nanosleep	240
 #define	FREEBSD32_SYS_ffclock_getcounter	241
 #define	FREEBSD32_SYS_ffclock_setestimate	242
 #define	FREEBSD32_SYS_ffclock_getestimate	243
 #define	FREEBSD32_SYS_freebsd32_clock_getcpuclockid2	247
 #define	FREEBSD32_SYS_minherit	250
 #define	FREEBSD32_SYS_rfork	251
 #define	FREEBSD32_SYS_openbsd_poll	252
 #define	FREEBSD32_SYS_issetugid	253
 #define	FREEBSD32_SYS_lchown	254
 #define	FREEBSD32_SYS_freebsd32_aio_read	255
 #define	FREEBSD32_SYS_freebsd32_aio_write	256
 #define	FREEBSD32_SYS_freebsd32_lio_listio	257
 #define	FREEBSD32_SYS_getdents	272
 #define	FREEBSD32_SYS_lchmod	274
 #define	FREEBSD32_SYS_netbsd_lchown	275
 #define	FREEBSD32_SYS_freebsd32_lutimes	276
 #define	FREEBSD32_SYS_netbsd_msync	277
 #define	FREEBSD32_SYS_nstat	278
 #define	FREEBSD32_SYS_nfstat	279
 #define	FREEBSD32_SYS_nlstat	280
 #define	FREEBSD32_SYS_freebsd32_preadv	289
 #define	FREEBSD32_SYS_freebsd32_pwritev	290
 				/* 297 is freebsd4 freebsd32_fhstatfs */
 #define	FREEBSD32_SYS_fhopen	298
 #define	FREEBSD32_SYS_fhstat	299
 #define	FREEBSD32_SYS_modnext	300
 #define	FREEBSD32_SYS_freebsd32_modstat	301
 #define	FREEBSD32_SYS_modfnext	302
 #define	FREEBSD32_SYS_modfind	303
 #define	FREEBSD32_SYS_kldload	304
 #define	FREEBSD32_SYS_kldunload	305
 #define	FREEBSD32_SYS_kldfind	306
 #define	FREEBSD32_SYS_kldnext	307
 #define	FREEBSD32_SYS_freebsd32_kldstat	308
 #define	FREEBSD32_SYS_kldfirstmod	309
 #define	FREEBSD32_SYS_getsid	310
 #define	FREEBSD32_SYS_setresuid	311
 #define	FREEBSD32_SYS_setresgid	312
 				/* 313 is obsolete signanosleep */
 #define	FREEBSD32_SYS_freebsd32_aio_return	314
 #define	FREEBSD32_SYS_freebsd32_aio_suspend	315
 #define	FREEBSD32_SYS_aio_cancel	316
 #define	FREEBSD32_SYS_freebsd32_aio_error	317
 				/* 318 is freebsd6 freebsd32_aio_read */
 				/* 319 is freebsd6 freebsd32_aio_write */
 				/* 320 is freebsd6 freebsd32_lio_listio */
 #define	FREEBSD32_SYS_yield	321
 				/* 322 is obsolete thr_sleep */
 				/* 323 is obsolete thr_wakeup */
 #define	FREEBSD32_SYS_mlockall	324
 #define	FREEBSD32_SYS_munlockall	325
 #define	FREEBSD32_SYS___getcwd	326
 #define	FREEBSD32_SYS_sched_setparam	327
 #define	FREEBSD32_SYS_sched_getparam	328
 #define	FREEBSD32_SYS_sched_setscheduler	329
 #define	FREEBSD32_SYS_sched_getscheduler	330
 #define	FREEBSD32_SYS_sched_yield	331
 #define	FREEBSD32_SYS_sched_get_priority_max	332
 #define	FREEBSD32_SYS_sched_get_priority_min	333
 #define	FREEBSD32_SYS_sched_rr_get_interval	334
 #define	FREEBSD32_SYS_utrace	335
 				/* 336 is freebsd4 freebsd32_sendfile */
 #define	FREEBSD32_SYS_kldsym	337
 #define	FREEBSD32_SYS_freebsd32_jail	338
 #define	FREEBSD32_SYS_sigprocmask	340
 #define	FREEBSD32_SYS_sigsuspend	341
 				/* 342 is freebsd4 freebsd32_sigaction */
 #define	FREEBSD32_SYS_sigpending	343
 				/* 344 is freebsd4 freebsd32_sigreturn */
 #define	FREEBSD32_SYS_freebsd32_sigtimedwait	345
 #define	FREEBSD32_SYS_freebsd32_sigwaitinfo	346
 #define	FREEBSD32_SYS___acl_get_file	347
 #define	FREEBSD32_SYS___acl_set_file	348
 #define	FREEBSD32_SYS___acl_get_fd	349
 #define	FREEBSD32_SYS___acl_set_fd	350
 #define	FREEBSD32_SYS___acl_delete_file	351
 #define	FREEBSD32_SYS___acl_delete_fd	352
 #define	FREEBSD32_SYS___acl_aclcheck_file	353
 #define	FREEBSD32_SYS___acl_aclcheck_fd	354
 #define	FREEBSD32_SYS_extattrctl	355
 #define	FREEBSD32_SYS_extattr_set_file	356
 #define	FREEBSD32_SYS_extattr_get_file	357
 #define	FREEBSD32_SYS_extattr_delete_file	358
 #define	FREEBSD32_SYS_freebsd32_aio_waitcomplete	359
 #define	FREEBSD32_SYS_getresuid	360
 #define	FREEBSD32_SYS_getresgid	361
 #define	FREEBSD32_SYS_kqueue	362
 #define	FREEBSD32_SYS_freebsd32_kevent	363
 #define	FREEBSD32_SYS_extattr_set_fd	371
 #define	FREEBSD32_SYS_extattr_get_fd	372
 #define	FREEBSD32_SYS_extattr_delete_fd	373
 #define	FREEBSD32_SYS___setugid	374
 #define	FREEBSD32_SYS_eaccess	376
 #define	FREEBSD32_SYS_freebsd32_nmount	378
 #define	FREEBSD32_SYS_kenv	390
 #define	FREEBSD32_SYS_lchflags	391
 #define	FREEBSD32_SYS_uuidgen	392
 #define	FREEBSD32_SYS_freebsd32_sendfile	393
 #define	FREEBSD32_SYS_getfsstat	395
 #define	FREEBSD32_SYS_statfs	396
 #define	FREEBSD32_SYS_fstatfs	397
 #define	FREEBSD32_SYS_fhstatfs	398
 #define	FREEBSD32_SYS_ksem_close	400
 #define	FREEBSD32_SYS_ksem_post	401
 #define	FREEBSD32_SYS_ksem_wait	402
 #define	FREEBSD32_SYS_ksem_trywait	403
 #define	FREEBSD32_SYS_freebsd32_ksem_init	404
 #define	FREEBSD32_SYS_freebsd32_ksem_open	405
 #define	FREEBSD32_SYS_ksem_unlink	406
 #define	FREEBSD32_SYS_ksem_getvalue	407
 #define	FREEBSD32_SYS_ksem_destroy	408
 #define	FREEBSD32_SYS_extattr_set_link	412
 #define	FREEBSD32_SYS_extattr_get_link	413
 #define	FREEBSD32_SYS_extattr_delete_link	414
 #define	FREEBSD32_SYS_freebsd32_sigaction	416
 #define	FREEBSD32_SYS_freebsd32_sigreturn	417
 #define	FREEBSD32_SYS_freebsd32_getcontext	421
 #define	FREEBSD32_SYS_freebsd32_setcontext	422
 #define	FREEBSD32_SYS_freebsd32_swapcontext	423
 #define	FREEBSD32_SYS___acl_get_link	425
 #define	FREEBSD32_SYS___acl_set_link	426
 #define	FREEBSD32_SYS___acl_delete_link	427
 #define	FREEBSD32_SYS___acl_aclcheck_link	428
 #define	FREEBSD32_SYS_sigwait	429
 #define	FREEBSD32_SYS_thr_exit	431
 #define	FREEBSD32_SYS_thr_self	432
 #define	FREEBSD32_SYS_thr_kill	433
 #define	FREEBSD32_SYS_jail_attach	436
 #define	FREEBSD32_SYS_extattr_list_fd	437
 #define	FREEBSD32_SYS_extattr_list_file	438
 #define	FREEBSD32_SYS_extattr_list_link	439
 #define	FREEBSD32_SYS_freebsd32_ksem_timedwait	441
 #define	FREEBSD32_SYS_freebsd32_thr_suspend	442
 #define	FREEBSD32_SYS_thr_wake	443
 #define	FREEBSD32_SYS_kldunloadf	444
 #define	FREEBSD32_SYS_audit	445
 #define	FREEBSD32_SYS_auditon	446
 #define	FREEBSD32_SYS_getauid	447
 #define	FREEBSD32_SYS_setauid	448
 #define	FREEBSD32_SYS_getaudit	449
 #define	FREEBSD32_SYS_setaudit	450
 #define	FREEBSD32_SYS_getaudit_addr	451
 #define	FREEBSD32_SYS_setaudit_addr	452
 #define	FREEBSD32_SYS_auditctl	453
 #define	FREEBSD32_SYS_freebsd32_umtx_op	454
 #define	FREEBSD32_SYS_freebsd32_thr_new	455
 #define	FREEBSD32_SYS_sigqueue	456
 #define	FREEBSD32_SYS_freebsd32_kmq_open	457
 #define	FREEBSD32_SYS_freebsd32_kmq_setattr	458
 #define	FREEBSD32_SYS_freebsd32_kmq_timedreceive	459
 #define	FREEBSD32_SYS_freebsd32_kmq_timedsend	460
 #define	FREEBSD32_SYS_freebsd32_kmq_notify	461
 #define	FREEBSD32_SYS_kmq_unlink	462
 #define	FREEBSD32_SYS_abort2	463
 #define	FREEBSD32_SYS_thr_set_name	464
 #define	FREEBSD32_SYS_freebsd32_aio_fsync	465
 #define	FREEBSD32_SYS_rtprio_thread	466
 #define	FREEBSD32_SYS_sctp_peeloff	471
 #define	FREEBSD32_SYS_sctp_generic_sendmsg	472
 #define	FREEBSD32_SYS_sctp_generic_sendmsg_iov	473
 #define	FREEBSD32_SYS_sctp_generic_recvmsg	474
 #define	FREEBSD32_SYS_freebsd32_pread	475
 #define	FREEBSD32_SYS_freebsd32_pwrite	476
 #define	FREEBSD32_SYS_freebsd32_mmap	477
 #define	FREEBSD32_SYS_freebsd32_lseek	478
 #define	FREEBSD32_SYS_freebsd32_truncate	479
 #define	FREEBSD32_SYS_freebsd32_ftruncate	480
 #define	FREEBSD32_SYS_freebsd32_pread	475
 #define	FREEBSD32_SYS_freebsd32_pwrite	476
 #define	FREEBSD32_SYS_freebsd32_mmap	477
 #define	FREEBSD32_SYS_freebsd32_lseek	478
 #define	FREEBSD32_SYS_freebsd32_truncate	479
 #define	FREEBSD32_SYS_freebsd32_ftruncate	480
 #define	FREEBSD32_SYS_thr_kill2	481
 #define	FREEBSD32_SYS_shm_open	482
 #define	FREEBSD32_SYS_shm_unlink	483
 #define	FREEBSD32_SYS_cpuset	484
 #define	FREEBSD32_SYS_freebsd32_cpuset_setid	485
 #define	FREEBSD32_SYS_freebsd32_cpuset_setid	485
 #define	FREEBSD32_SYS_freebsd32_cpuset_getid	486
 #define	FREEBSD32_SYS_freebsd32_cpuset_getaffinity	487
 #define	FREEBSD32_SYS_freebsd32_cpuset_setaffinity	488
 #define	FREEBSD32_SYS_faccessat	489
 #define	FREEBSD32_SYS_fchmodat	490
 #define	FREEBSD32_SYS_fchownat	491
 #define	FREEBSD32_SYS_freebsd32_fexecve	492
 #define	FREEBSD32_SYS_freebsd32_fstatat	493
 #define	FREEBSD32_SYS_freebsd32_futimesat	494
 #define	FREEBSD32_SYS_linkat	495
 #define	FREEBSD32_SYS_mkdirat	496
 #define	FREEBSD32_SYS_mkfifoat	497
 #define	FREEBSD32_SYS_mknodat	498
 #define	FREEBSD32_SYS_openat	499
 #define	FREEBSD32_SYS_readlinkat	500
 #define	FREEBSD32_SYS_renameat	501
 #define	FREEBSD32_SYS_symlinkat	502
 #define	FREEBSD32_SYS_unlinkat	503
 #define	FREEBSD32_SYS_posix_openpt	504
 #define	FREEBSD32_SYS_freebsd32_jail_get	506
 #define	FREEBSD32_SYS_freebsd32_jail_set	507
 #define	FREEBSD32_SYS_jail_remove	508
 #define	FREEBSD32_SYS_closefrom	509
 #define	FREEBSD32_SYS_freebsd32_semctl	510
 #define	FREEBSD32_SYS_freebsd32_msgctl	511
 #define	FREEBSD32_SYS_freebsd32_shmctl	512
 #define	FREEBSD32_SYS_lpathconf	513
 				/* 514 is obsolete cap_new */
 #define	FREEBSD32_SYS___cap_rights_get	515
 #define	FREEBSD32_SYS_freebsd32_cap_enter	516
 #define	FREEBSD32_SYS_cap_getmode	517
 #define	FREEBSD32_SYS_pdfork	518
 #define	FREEBSD32_SYS_pdkill	519
 #define	FREEBSD32_SYS_pdgetpid	520
 #define	FREEBSD32_SYS_freebsd32_pselect	522
 #define	FREEBSD32_SYS_getloginclass	523
 #define	FREEBSD32_SYS_setloginclass	524
 #define	FREEBSD32_SYS_rctl_get_racct	525
 #define	FREEBSD32_SYS_rctl_get_rules	526
 #define	FREEBSD32_SYS_rctl_get_limits	527
 #define	FREEBSD32_SYS_rctl_add_rule	528
 #define	FREEBSD32_SYS_rctl_remove_rule	529
 #define	FREEBSD32_SYS_freebsd32_posix_fallocate	530
 #define	FREEBSD32_SYS_freebsd32_posix_fadvise	531
 #define	FREEBSD32_SYS_freebsd32_wait6	532
 #define	FREEBSD32_SYS_freebsd32_posix_fallocate	530
 #define	FREEBSD32_SYS_freebsd32_posix_fadvise	531
 #define	FREEBSD32_SYS_freebsd32_wait6	532
 #define	FREEBSD32_SYS_cap_rights_limit	533
 #define	FREEBSD32_SYS_freebsd32_cap_ioctls_limit	534
 #define	FREEBSD32_SYS_freebsd32_cap_ioctls_get	535
 #define	FREEBSD32_SYS_cap_fcntls_limit	536
 #define	FREEBSD32_SYS_cap_fcntls_get	537
 #define	FREEBSD32_SYS_bindat	538
 #define	FREEBSD32_SYS_connectat	539
 #define	FREEBSD32_SYS_chflagsat	540
 #define	FREEBSD32_SYS_accept4	541
 #define	FREEBSD32_SYS_pipe2	542
 #define	FREEBSD32_SYS_freebsd32_aio_mlock	543
 #define	FREEBSD32_SYS_freebsd32_procctl	544
 #define	FREEBSD32_SYS_freebsd32_procctl	544
 #define	FREEBSD32_SYS_freebsd32_ppoll	545
 #define	FREEBSD32_SYS_freebsd32_futimens	546
 #define	FREEBSD32_SYS_freebsd32_utimensat	547
 #define	FREEBSD32_SYS_numa_getaffinity	548
 #define	FREEBSD32_SYS_numa_setaffinity	549
 #define	FREEBSD32_SYS_MAXSYSCALL	550
Index: user/alc/PQ_LAUNDRY/sys/dev/cxgbe/t4_main.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/dev/cxgbe/t4_main.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/dev/cxgbe/t4_main.c	(revision 303775)
@@ -1,9532 +1,9532 @@
 /*-
  * Copyright (c) 2011 Chelsio Communications, Inc.
  * All rights reserved.
  * Written by: Navdeep Parhar <np@FreeBSD.org>
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_ddb.h"
 #include "opt_inet.h"
 #include "opt_inet6.h"
 #include "opt_rss.h"
 
 #include <sys/param.h>
 #include <sys/conf.h>
 #include <sys/priv.h>
 #include <sys/kernel.h>
 #include <sys/bus.h>
 #include <sys/module.h>
 #include <sys/malloc.h>
 #include <sys/queue.h>
 #include <sys/taskqueue.h>
 #include <sys/pciio.h>
 #include <dev/pci/pcireg.h>
 #include <dev/pci/pcivar.h>
 #include <dev/pci/pci_private.h>
 #include <sys/firmware.h>
 #include <sys/sbuf.h>
 #include <sys/smp.h>
 #include <sys/socket.h>
 #include <sys/sockio.h>
 #include <sys/sysctl.h>
 #include <net/ethernet.h>
 #include <net/if.h>
 #include <net/if_types.h>
 #include <net/if_dl.h>
 #include <net/if_vlan_var.h>
 #ifdef RSS
 #include <net/rss_config.h>
 #endif
 #if defined(__i386__) || defined(__amd64__)
 #include <vm/vm.h>
 #include <vm/pmap.h>
 #endif
 #ifdef DDB
 #include <ddb/ddb.h>
 #include <ddb/db_lex.h>
 #endif
 
 #include "common/common.h"
 #include "common/t4_msg.h"
 #include "common/t4_regs.h"
 #include "common/t4_regs_values.h"
 #include "t4_ioctl.h"
 #include "t4_l2t.h"
 #include "t4_mp_ring.h"
 #include "t4_if.h"
 
 /* T4 bus driver interface */
 static int t4_probe(device_t);
 static int t4_attach(device_t);
 static int t4_detach(device_t);
 static int t4_ready(device_t);
 static int t4_read_port_device(device_t, int, device_t *);
 static device_method_t t4_methods[] = {
 	DEVMETHOD(device_probe,		t4_probe),
 	DEVMETHOD(device_attach,	t4_attach),
 	DEVMETHOD(device_detach,	t4_detach),
 
 	DEVMETHOD(t4_is_main_ready,	t4_ready),
 	DEVMETHOD(t4_read_port_device,	t4_read_port_device),
 
 	DEVMETHOD_END
 };
 static driver_t t4_driver = {
 	"t4nex",
 	t4_methods,
 	sizeof(struct adapter)
 };
 
 
 /* T4 port (cxgbe) interface */
 static int cxgbe_probe(device_t);
 static int cxgbe_attach(device_t);
 static int cxgbe_detach(device_t);
 static device_method_t cxgbe_methods[] = {
 	DEVMETHOD(device_probe,		cxgbe_probe),
 	DEVMETHOD(device_attach,	cxgbe_attach),
 	DEVMETHOD(device_detach,	cxgbe_detach),
 	{ 0, 0 }
 };
 static driver_t cxgbe_driver = {
 	"cxgbe",
 	cxgbe_methods,
 	sizeof(struct port_info)
 };
 
 /* T4 VI (vcxgbe) interface */
 static int vcxgbe_probe(device_t);
 static int vcxgbe_attach(device_t);
 static int vcxgbe_detach(device_t);
 static device_method_t vcxgbe_methods[] = {
 	DEVMETHOD(device_probe,		vcxgbe_probe),
 	DEVMETHOD(device_attach,	vcxgbe_attach),
 	DEVMETHOD(device_detach,	vcxgbe_detach),
 	{ 0, 0 }
 };
 static driver_t vcxgbe_driver = {
 	"vcxgbe",
 	vcxgbe_methods,
 	sizeof(struct vi_info)
 };
 
 static d_ioctl_t t4_ioctl;
 
 static struct cdevsw t4_cdevsw = {
        .d_version = D_VERSION,
        .d_ioctl = t4_ioctl,
        .d_name = "t4nex",
 };
 
 /* T5 bus driver interface */
 static int t5_probe(device_t);
 static device_method_t t5_methods[] = {
 	DEVMETHOD(device_probe,		t5_probe),
 	DEVMETHOD(device_attach,	t4_attach),
 	DEVMETHOD(device_detach,	t4_detach),
 
 	DEVMETHOD(t4_is_main_ready,	t4_ready),
 	DEVMETHOD(t4_read_port_device,	t4_read_port_device),
 
 	DEVMETHOD_END
 };
 static driver_t t5_driver = {
 	"t5nex",
 	t5_methods,
 	sizeof(struct adapter)
 };
 
 
 /* T5 port (cxl) interface */
 static driver_t cxl_driver = {
 	"cxl",
 	cxgbe_methods,
 	sizeof(struct port_info)
 };
 
 /* T5 VI (vcxl) interface */
 static driver_t vcxl_driver = {
 	"vcxl",
 	vcxgbe_methods,
 	sizeof(struct vi_info)
 };
 
 /* ifnet + media interface */
 static void cxgbe_init(void *);
 static int cxgbe_ioctl(struct ifnet *, unsigned long, caddr_t);
 static int cxgbe_transmit(struct ifnet *, struct mbuf *);
 static void cxgbe_qflush(struct ifnet *);
 static int cxgbe_media_change(struct ifnet *);
 static void cxgbe_media_status(struct ifnet *, struct ifmediareq *);
 
 MALLOC_DEFINE(M_CXGBE, "cxgbe", "Chelsio T4/T5 Ethernet driver and services");
 
 /*
  * Correct lock order when you need to acquire multiple locks is t4_list_lock,
  * then ADAPTER_LOCK, then t4_uld_list_lock.
  */
 static struct sx t4_list_lock;
 SLIST_HEAD(, adapter) t4_list;
 #ifdef TCP_OFFLOAD
 static struct sx t4_uld_list_lock;
 SLIST_HEAD(, uld_info) t4_uld_list;
 #endif
 
 /*
  * Tunables.  See tweak_tunables() too.
  *
  * Each tunable is set to a default value here if it's known at compile-time.
  * Otherwise it is set to -1 as an indication to tweak_tunables() that it should
  * provide a reasonable default when the driver is loaded.
  *
  * Tunables applicable to both T4 and T5 are under hw.cxgbe.  Those specific to
  * T5 are under hw.cxl.
  */
 
 /*
  * Number of queues for tx and rx, 10G and 1G, NIC and offload.
  */
 #define NTXQ_10G 16
 static int t4_ntxq10g = -1;
 TUNABLE_INT("hw.cxgbe.ntxq10g", &t4_ntxq10g);
 
 #define NRXQ_10G 8
 static int t4_nrxq10g = -1;
 TUNABLE_INT("hw.cxgbe.nrxq10g", &t4_nrxq10g);
 
 #define NTXQ_1G 4
 static int t4_ntxq1g = -1;
 TUNABLE_INT("hw.cxgbe.ntxq1g", &t4_ntxq1g);
 
 #define NRXQ_1G 2
 static int t4_nrxq1g = -1;
 TUNABLE_INT("hw.cxgbe.nrxq1g", &t4_nrxq1g);
 
 #define NTXQ_VI 1
 static int t4_ntxq_vi = -1;
 TUNABLE_INT("hw.cxgbe.ntxq_vi", &t4_ntxq_vi);
 
 #define NRXQ_VI 1
 static int t4_nrxq_vi = -1;
 TUNABLE_INT("hw.cxgbe.nrxq_vi", &t4_nrxq_vi);
 
 static int t4_rsrv_noflowq = 0;
 TUNABLE_INT("hw.cxgbe.rsrv_noflowq", &t4_rsrv_noflowq);
 
 #ifdef TCP_OFFLOAD
 #define NOFLDTXQ_10G 8
 static int t4_nofldtxq10g = -1;
 TUNABLE_INT("hw.cxgbe.nofldtxq10g", &t4_nofldtxq10g);
 
 #define NOFLDRXQ_10G 2
 static int t4_nofldrxq10g = -1;
 TUNABLE_INT("hw.cxgbe.nofldrxq10g", &t4_nofldrxq10g);
 
 #define NOFLDTXQ_1G 2
 static int t4_nofldtxq1g = -1;
 TUNABLE_INT("hw.cxgbe.nofldtxq1g", &t4_nofldtxq1g);
 
 #define NOFLDRXQ_1G 1
 static int t4_nofldrxq1g = -1;
 TUNABLE_INT("hw.cxgbe.nofldrxq1g", &t4_nofldrxq1g);
 
 #define NOFLDTXQ_VI 1
 static int t4_nofldtxq_vi = -1;
 TUNABLE_INT("hw.cxgbe.nofldtxq_vi", &t4_nofldtxq_vi);
 
 #define NOFLDRXQ_VI 1
 static int t4_nofldrxq_vi = -1;
 TUNABLE_INT("hw.cxgbe.nofldrxq_vi", &t4_nofldrxq_vi);
 #endif
 
 #ifdef DEV_NETMAP
 #define NNMTXQ_VI 2
 static int t4_nnmtxq_vi = -1;
 TUNABLE_INT("hw.cxgbe.nnmtxq_vi", &t4_nnmtxq_vi);
 
 #define NNMRXQ_VI 2
 static int t4_nnmrxq_vi = -1;
 TUNABLE_INT("hw.cxgbe.nnmrxq_vi", &t4_nnmrxq_vi);
 #endif
 
 /*
  * Holdoff parameters for 10G and 1G ports.
  */
 #define TMR_IDX_10G 1
 static int t4_tmr_idx_10g = TMR_IDX_10G;
 TUNABLE_INT("hw.cxgbe.holdoff_timer_idx_10G", &t4_tmr_idx_10g);
 
 #define PKTC_IDX_10G (-1)
 static int t4_pktc_idx_10g = PKTC_IDX_10G;
 TUNABLE_INT("hw.cxgbe.holdoff_pktc_idx_10G", &t4_pktc_idx_10g);
 
 #define TMR_IDX_1G 1
 static int t4_tmr_idx_1g = TMR_IDX_1G;
 TUNABLE_INT("hw.cxgbe.holdoff_timer_idx_1G", &t4_tmr_idx_1g);
 
 #define PKTC_IDX_1G (-1)
 static int t4_pktc_idx_1g = PKTC_IDX_1G;
 TUNABLE_INT("hw.cxgbe.holdoff_pktc_idx_1G", &t4_pktc_idx_1g);
 
 /*
  * Size (# of entries) of each tx and rx queue.
  */
 static unsigned int t4_qsize_txq = TX_EQ_QSIZE;
 TUNABLE_INT("hw.cxgbe.qsize_txq", &t4_qsize_txq);
 
 static unsigned int t4_qsize_rxq = RX_IQ_QSIZE;
 TUNABLE_INT("hw.cxgbe.qsize_rxq", &t4_qsize_rxq);
 
 /*
  * Interrupt types allowed (bits 0, 1, 2 = INTx, MSI, MSI-X respectively).
  */
 static int t4_intr_types = INTR_MSIX | INTR_MSI | INTR_INTX;
 TUNABLE_INT("hw.cxgbe.interrupt_types", &t4_intr_types);
 
 /*
  * Configuration file.
  */
 #define DEFAULT_CF	"default"
 #define FLASH_CF	"flash"
 #define UWIRE_CF	"uwire"
 #define FPGA_CF		"fpga"
 static char t4_cfg_file[32] = DEFAULT_CF;
 TUNABLE_STR("hw.cxgbe.config_file", t4_cfg_file, sizeof(t4_cfg_file));
 
 /*
  * PAUSE settings (bit 0, 1 = rx_pause, tx_pause respectively).
  * rx_pause = 1 to heed incoming PAUSE frames, 0 to ignore them.
  * tx_pause = 1 to emit PAUSE frames when the rx FIFO reaches its high water
  *            mark or when signalled to do so, 0 to never emit PAUSE.
  */
 static int t4_pause_settings = PAUSE_TX | PAUSE_RX;
 TUNABLE_INT("hw.cxgbe.pause_settings", &t4_pause_settings);
 
 /*
  * Firmware auto-install by driver during attach (0, 1, 2 = prohibited, allowed,
  * encouraged respectively).
  */
 static unsigned int t4_fw_install = 1;
 TUNABLE_INT("hw.cxgbe.fw_install", &t4_fw_install);
 
 /*
  * ASIC features that will be used.  Disable the ones you don't want so that the
  * chip resources aren't wasted on features that will not be used.
  */
 static int t4_nbmcaps_allowed = 0;
 TUNABLE_INT("hw.cxgbe.nbmcaps_allowed", &t4_nbmcaps_allowed);
 
 static int t4_linkcaps_allowed = 0;	/* No DCBX, PPP, etc. by default */
 TUNABLE_INT("hw.cxgbe.linkcaps_allowed", &t4_linkcaps_allowed);
 
 static int t4_switchcaps_allowed = FW_CAPS_CONFIG_SWITCH_INGRESS |
     FW_CAPS_CONFIG_SWITCH_EGRESS;
 TUNABLE_INT("hw.cxgbe.switchcaps_allowed", &t4_switchcaps_allowed);
 
 static int t4_niccaps_allowed = FW_CAPS_CONFIG_NIC;
 TUNABLE_INT("hw.cxgbe.niccaps_allowed", &t4_niccaps_allowed);
 
 static int t4_toecaps_allowed = -1;
 TUNABLE_INT("hw.cxgbe.toecaps_allowed", &t4_toecaps_allowed);
 
 static int t4_rdmacaps_allowed = -1;
 TUNABLE_INT("hw.cxgbe.rdmacaps_allowed", &t4_rdmacaps_allowed);
 
 static int t4_tlscaps_allowed = 0;
 TUNABLE_INT("hw.cxgbe.tlscaps_allowed", &t4_tlscaps_allowed);
 
 static int t4_iscsicaps_allowed = -1;
 TUNABLE_INT("hw.cxgbe.iscsicaps_allowed", &t4_iscsicaps_allowed);
 
 static int t4_fcoecaps_allowed = 0;
 TUNABLE_INT("hw.cxgbe.fcoecaps_allowed", &t4_fcoecaps_allowed);
 
 static int t5_write_combine = 0;
 TUNABLE_INT("hw.cxl.write_combine", &t5_write_combine);
 
 static int t4_num_vis = 1;
 TUNABLE_INT("hw.cxgbe.num_vis", &t4_num_vis);
 
 /* Functions used by extra VIs to obtain unique MAC addresses for each VI. */
 static int vi_mac_funcs[] = {
 	FW_VI_FUNC_OFLD,
 	FW_VI_FUNC_IWARP,
 	FW_VI_FUNC_OPENISCSI,
 	FW_VI_FUNC_OPENFCOE,
 	FW_VI_FUNC_FOISCSI,
 	FW_VI_FUNC_FOFCOE,
 };
 
 struct intrs_and_queues {
 	uint16_t intr_type;	/* INTx, MSI, or MSI-X */
 	uint16_t nirq;		/* Total # of vectors */
 	uint16_t intr_flags_10g;/* Interrupt flags for each 10G port */
 	uint16_t intr_flags_1g;	/* Interrupt flags for each 1G port */
 	uint16_t ntxq10g;	/* # of NIC txq's for each 10G port */
 	uint16_t nrxq10g;	/* # of NIC rxq's for each 10G port */
 	uint16_t ntxq1g;	/* # of NIC txq's for each 1G port */
 	uint16_t nrxq1g;	/* # of NIC rxq's for each 1G port */
 	uint16_t rsrv_noflowq;	/* Flag whether to reserve queue 0 */
 	uint16_t nofldtxq10g;	/* # of TOE txq's for each 10G port */
 	uint16_t nofldrxq10g;	/* # of TOE rxq's for each 10G port */
 	uint16_t nofldtxq1g;	/* # of TOE txq's for each 1G port */
 	uint16_t nofldrxq1g;	/* # of TOE rxq's for each 1G port */
 
 	/* The vcxgbe/vcxl interfaces use these and not the ones above. */
 	uint16_t ntxq_vi;	/* # of NIC txq's */
 	uint16_t nrxq_vi;	/* # of NIC rxq's */
 	uint16_t nofldtxq_vi;	/* # of TOE txq's */
 	uint16_t nofldrxq_vi;	/* # of TOE rxq's */
 	uint16_t nnmtxq_vi;	/* # of netmap txq's */
 	uint16_t nnmrxq_vi;	/* # of netmap rxq's */
 };
 
 struct filter_entry {
         uint32_t valid:1;	/* filter allocated and valid */
         uint32_t locked:1;	/* filter is administratively locked */
         uint32_t pending:1;	/* filter action is pending firmware reply */
 	uint32_t smtidx:8;	/* Source MAC Table index for smac */
 	struct l2t_entry *l2t;	/* Layer Two Table entry for dmac */
 
         struct t4_filter_specification fs;
 };
 
 static int map_bars_0_and_4(struct adapter *);
 static int map_bar_2(struct adapter *);
 static void setup_memwin(struct adapter *);
 static void position_memwin(struct adapter *, int, uint32_t);
 static int rw_via_memwin(struct adapter *, int, uint32_t, uint32_t *, int, int);
 static inline int read_via_memwin(struct adapter *, int, uint32_t, uint32_t *,
     int);
 static inline int write_via_memwin(struct adapter *, int, uint32_t,
     const uint32_t *, int);
 static int validate_mem_range(struct adapter *, uint32_t, int);
 static int fwmtype_to_hwmtype(int);
 static int validate_mt_off_len(struct adapter *, int, uint32_t, int,
     uint32_t *);
 static int fixup_devlog_params(struct adapter *);
 static int cfg_itype_and_nqueues(struct adapter *, int, int, int,
     struct intrs_and_queues *);
 static int prep_firmware(struct adapter *);
 static int partition_resources(struct adapter *, const struct firmware *,
     const char *);
 static int get_params__pre_init(struct adapter *);
 static int get_params__post_init(struct adapter *);
 static int set_params__post_init(struct adapter *);
 static void t4_set_desc(struct adapter *);
 static void build_medialist(struct port_info *, struct ifmedia *);
 static int cxgbe_init_synchronized(struct vi_info *);
 static int cxgbe_uninit_synchronized(struct vi_info *);
 static int setup_intr_handlers(struct adapter *);
 static void quiesce_txq(struct adapter *, struct sge_txq *);
 static void quiesce_wrq(struct adapter *, struct sge_wrq *);
 static void quiesce_iq(struct adapter *, struct sge_iq *);
 static void quiesce_fl(struct adapter *, struct sge_fl *);
 static int t4_alloc_irq(struct adapter *, struct irq *, int rid,
     driver_intr_t *, void *, char *);
 static int t4_free_irq(struct adapter *, struct irq *);
 static void get_regs(struct adapter *, struct t4_regdump *, uint8_t *);
 static void vi_refresh_stats(struct adapter *, struct vi_info *);
 static void cxgbe_refresh_stats(struct adapter *, struct port_info *);
 static void cxgbe_tick(void *);
 static void cxgbe_vlan_config(void *, struct ifnet *, uint16_t);
 static void t4_sysctls(struct adapter *);
 static void cxgbe_sysctls(struct port_info *);
 static int sysctl_int_array(SYSCTL_HANDLER_ARGS);
 static int sysctl_bitfield(SYSCTL_HANDLER_ARGS);
 static int sysctl_btphy(SYSCTL_HANDLER_ARGS);
 static int sysctl_noflowq(SYSCTL_HANDLER_ARGS);
 static int sysctl_holdoff_tmr_idx(SYSCTL_HANDLER_ARGS);
 static int sysctl_holdoff_pktc_idx(SYSCTL_HANDLER_ARGS);
 static int sysctl_qsize_rxq(SYSCTL_HANDLER_ARGS);
 static int sysctl_qsize_txq(SYSCTL_HANDLER_ARGS);
 static int sysctl_pause_settings(SYSCTL_HANDLER_ARGS);
 static int sysctl_handle_t4_reg64(SYSCTL_HANDLER_ARGS);
 static int sysctl_temperature(SYSCTL_HANDLER_ARGS);
 #ifdef SBUF_DRAIN
 static int sysctl_cctrl(SYSCTL_HANDLER_ARGS);
 static int sysctl_cim_ibq_obq(SYSCTL_HANDLER_ARGS);
 static int sysctl_cim_la(SYSCTL_HANDLER_ARGS);
 static int sysctl_cim_la_t6(SYSCTL_HANDLER_ARGS);
 static int sysctl_cim_ma_la(SYSCTL_HANDLER_ARGS);
 static int sysctl_cim_pif_la(SYSCTL_HANDLER_ARGS);
 static int sysctl_cim_qcfg(SYSCTL_HANDLER_ARGS);
 static int sysctl_cpl_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_ddp_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_devlog(SYSCTL_HANDLER_ARGS);
 static int sysctl_fcoe_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_hw_sched(SYSCTL_HANDLER_ARGS);
 static int sysctl_lb_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_linkdnrc(SYSCTL_HANDLER_ARGS);
 static int sysctl_meminfo(SYSCTL_HANDLER_ARGS);
 static int sysctl_mps_tcam(SYSCTL_HANDLER_ARGS);
 static int sysctl_mps_tcam_t6(SYSCTL_HANDLER_ARGS);
 static int sysctl_path_mtus(SYSCTL_HANDLER_ARGS);
 static int sysctl_pm_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_rdma_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_tcp_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_tids(SYSCTL_HANDLER_ARGS);
 static int sysctl_tp_err_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_tp_la_mask(SYSCTL_HANDLER_ARGS);
 static int sysctl_tp_la(SYSCTL_HANDLER_ARGS);
 static int sysctl_tx_rate(SYSCTL_HANDLER_ARGS);
 static int sysctl_ulprx_la(SYSCTL_HANDLER_ARGS);
 static int sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS);
 static int sysctl_tc_params(SYSCTL_HANDLER_ARGS);
 #endif
 #ifdef TCP_OFFLOAD
 static int sysctl_tp_tick(SYSCTL_HANDLER_ARGS);
 static int sysctl_tp_dack_timer(SYSCTL_HANDLER_ARGS);
 static int sysctl_tp_timer(SYSCTL_HANDLER_ARGS);
 #endif
 static uint32_t fconf_iconf_to_mode(uint32_t, uint32_t);
 static uint32_t mode_to_fconf(uint32_t);
 static uint32_t mode_to_iconf(uint32_t);
 static int check_fspec_against_fconf_iconf(struct adapter *,
     struct t4_filter_specification *);
 static int get_filter_mode(struct adapter *, uint32_t *);
 static int set_filter_mode(struct adapter *, uint32_t);
 static inline uint64_t get_filter_hits(struct adapter *, uint32_t);
 static int get_filter(struct adapter *, struct t4_filter *);
 static int set_filter(struct adapter *, struct t4_filter *);
 static int del_filter(struct adapter *, struct t4_filter *);
 static void clear_filter(struct filter_entry *);
 static int set_filter_wr(struct adapter *, int);
 static int del_filter_wr(struct adapter *, int);
 static int set_tcb_rpl(struct sge_iq *, const struct rss_header *,
     struct mbuf *);
 static int get_sge_context(struct adapter *, struct t4_sge_context *);
 static int load_fw(struct adapter *, struct t4_data *);
 static int read_card_mem(struct adapter *, int, struct t4_mem_range *);
 static int read_i2c(struct adapter *, struct t4_i2c_data *);
 static int set_sched_class(struct adapter *, struct t4_sched_params *);
 static int set_sched_queue(struct adapter *, struct t4_sched_queue *);
 #ifdef TCP_OFFLOAD
 static int toe_capability(struct vi_info *, int);
 #endif
 static int mod_event(module_t, int, void *);
 static int notify_siblings(device_t, int);
 
 struct {
 	uint16_t device;
 	char *desc;
 } t4_pciids[] = {
 	{0xa000, "Chelsio Terminator 4 FPGA"},
 	{0x4400, "Chelsio T440-dbg"},
 	{0x4401, "Chelsio T420-CR"},
 	{0x4402, "Chelsio T422-CR"},
 	{0x4403, "Chelsio T440-CR"},
 	{0x4404, "Chelsio T420-BCH"},
 	{0x4405, "Chelsio T440-BCH"},
 	{0x4406, "Chelsio T440-CH"},
 	{0x4407, "Chelsio T420-SO"},
 	{0x4408, "Chelsio T420-CX"},
 	{0x4409, "Chelsio T420-BT"},
 	{0x440a, "Chelsio T404-BT"},
 	{0x440e, "Chelsio T440-LP-CR"},
 }, t5_pciids[] = {
 	{0xb000, "Chelsio Terminator 5 FPGA"},
 	{0x5400, "Chelsio T580-dbg"},
 	{0x5401,  "Chelsio T520-CR"},		/* 2 x 10G */
 	{0x5402,  "Chelsio T522-CR"},		/* 2 x 10G, 2 X 1G */
 	{0x5403,  "Chelsio T540-CR"},		/* 4 x 10G */
 	{0x5407,  "Chelsio T520-SO"},		/* 2 x 10G, nomem */
 	{0x5409,  "Chelsio T520-BT"},		/* 2 x 10GBaseT */
 	{0x540a,  "Chelsio T504-BT"},		/* 4 x 1G */
 	{0x540d,  "Chelsio T580-CR"},		/* 2 x 40G */
 	{0x540e,  "Chelsio T540-LP-CR"},	/* 4 x 10G */
 	{0x5410,  "Chelsio T580-LP-CR"},	/* 2 x 40G */
 	{0x5411,  "Chelsio T520-LL-CR"},	/* 2 x 10G */
 	{0x5412,  "Chelsio T560-CR"},		/* 1 x 40G, 2 x 10G */
 	{0x5414,  "Chelsio T580-LP-SO-CR"},	/* 2 x 40G, nomem */
 	{0x5415,  "Chelsio T502-BT"},		/* 2 x 1G */
 #ifdef notyet
 	{0x5404,  "Chelsio T520-BCH"},
 	{0x5405,  "Chelsio T540-BCH"},
 	{0x5406,  "Chelsio T540-CH"},
 	{0x5408,  "Chelsio T520-CX"},
 	{0x540b,  "Chelsio B520-SR"},
 	{0x540c,  "Chelsio B504-BT"},
 	{0x540f,  "Chelsio Amsterdam"},
 	{0x5413,  "Chelsio T580-CHR"},
 #endif
 };
 
 #ifdef TCP_OFFLOAD
 /*
  * service_iq() has an iq and needs the fl.  Offset of fl from the iq should be
  * exactly the same for both rxq and ofld_rxq.
  */
 CTASSERT(offsetof(struct sge_ofld_rxq, iq) == offsetof(struct sge_rxq, iq));
 CTASSERT(offsetof(struct sge_ofld_rxq, fl) == offsetof(struct sge_rxq, fl));
 #endif
 CTASSERT(sizeof(struct cluster_metadata) <= CL_METADATA_SIZE);
 
 static int
 t4_probe(device_t dev)
 {
 	int i;
 	uint16_t v = pci_get_vendor(dev);
 	uint16_t d = pci_get_device(dev);
 	uint8_t f = pci_get_function(dev);
 
 	if (v != PCI_VENDOR_ID_CHELSIO)
 		return (ENXIO);
 
 	/* Attach only to PF0 of the FPGA */
 	if (d == 0xa000 && f != 0)
 		return (ENXIO);
 
 	for (i = 0; i < nitems(t4_pciids); i++) {
 		if (d == t4_pciids[i].device) {
 			device_set_desc(dev, t4_pciids[i].desc);
 			return (BUS_PROBE_DEFAULT);
 		}
 	}
 
 	return (ENXIO);
 }
 
 static int
 t5_probe(device_t dev)
 {
 	int i;
 	uint16_t v = pci_get_vendor(dev);
 	uint16_t d = pci_get_device(dev);
 	uint8_t f = pci_get_function(dev);
 
 	if (v != PCI_VENDOR_ID_CHELSIO)
 		return (ENXIO);
 
 	/* Attach only to PF0 of the FPGA */
 	if (d == 0xb000 && f != 0)
 		return (ENXIO);
 
 	for (i = 0; i < nitems(t5_pciids); i++) {
 		if (d == t5_pciids[i].device) {
 			device_set_desc(dev, t5_pciids[i].desc);
 			return (BUS_PROBE_DEFAULT);
 		}
 	}
 
 	return (ENXIO);
 }
 
 static void
 t5_attribute_workaround(device_t dev)
 {
 	device_t root_port;
 	uint32_t v;
 
 	/*
 	 * The T5 chips do not properly echo the No Snoop and Relaxed
 	 * Ordering attributes when replying to a TLP from a Root
 	 * Port.  As a workaround, find the parent Root Port and
 	 * disable No Snoop and Relaxed Ordering.  Note that this
 	 * affects all devices under this root port.
 	 */
 	root_port = pci_find_pcie_root_port(dev);
 	if (root_port == NULL) {
 		device_printf(dev, "Unable to find parent root port\n");
 		return;
 	}
 
 	v = pcie_adjust_config(root_port, PCIER_DEVICE_CTL,
 	    PCIEM_CTL_RELAXED_ORD_ENABLE | PCIEM_CTL_NOSNOOP_ENABLE, 0, 2);
 	if ((v & (PCIEM_CTL_RELAXED_ORD_ENABLE | PCIEM_CTL_NOSNOOP_ENABLE)) !=
 	    0)
 		device_printf(dev, "Disabled No Snoop/Relaxed Ordering on %s\n",
 		    device_get_nameunit(root_port));
 }
 
 static int
 t4_attach(device_t dev)
 {
 	struct adapter *sc;
 	int rc = 0, i, j, n10g, n1g, rqidx, tqidx;
 	struct make_dev_args mda;
 	struct intrs_and_queues iaq;
 	struct sge *s;
 	uint8_t *buf;
 #ifdef TCP_OFFLOAD
 	int ofld_rqidx, ofld_tqidx;
 #endif
 #ifdef DEV_NETMAP
 	int nm_rqidx, nm_tqidx;
 #endif
 	int num_vis;
 
 	sc = device_get_softc(dev);
 	sc->dev = dev;
 	TUNABLE_INT_FETCH("hw.cxgbe.debug_flags", &sc->debug_flags);
 
 	if ((pci_get_device(dev) & 0xff00) == 0x5400)
 		t5_attribute_workaround(dev);
 	pci_enable_busmaster(dev);
 	if (pci_find_cap(dev, PCIY_EXPRESS, &i) == 0) {
 		uint32_t v;
 
 		pci_set_max_read_req(dev, 4096);
 		v = pci_read_config(dev, i + PCIER_DEVICE_CTL, 2);
 		v |= PCIEM_CTL_RELAXED_ORD_ENABLE;
 		pci_write_config(dev, i + PCIER_DEVICE_CTL, v, 2);
 
 		sc->params.pci.mps = 128 << ((v & PCIEM_CTL_MAX_PAYLOAD) >> 5);
 	}
 
 	sc->sge_gts_reg = MYPF_REG(A_SGE_PF_GTS);
 	sc->sge_kdoorbell_reg = MYPF_REG(A_SGE_PF_KDOORBELL);
 	sc->traceq = -1;
 	mtx_init(&sc->ifp_lock, sc->ifp_lockname, 0, MTX_DEF);
 	snprintf(sc->ifp_lockname, sizeof(sc->ifp_lockname), "%s tracer",
 	    device_get_nameunit(dev));
 
 	snprintf(sc->lockname, sizeof(sc->lockname), "%s",
 	    device_get_nameunit(dev));
 	mtx_init(&sc->sc_lock, sc->lockname, 0, MTX_DEF);
 	sx_xlock(&t4_list_lock);
 	SLIST_INSERT_HEAD(&t4_list, sc, link);
 	sx_xunlock(&t4_list_lock);
 
 	mtx_init(&sc->sfl_lock, "starving freelists", 0, MTX_DEF);
 	TAILQ_INIT(&sc->sfl);
 	callout_init_mtx(&sc->sfl_callout, &sc->sfl_lock, 0);
 
 	mtx_init(&sc->reg_lock, "indirect register access", 0, MTX_DEF);
 
 	rc = map_bars_0_and_4(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	/*
 	 * This is the real PF# to which we're attaching.  Works from within PCI
 	 * passthrough environments too, where pci_get_function() could return a
 	 * different PF# depending on the passthrough configuration.  We need to
 	 * use the real PF# in all our communication with the firmware.
 	 */
 	sc->pf = G_SOURCEPF(t4_read_reg(sc, A_PL_WHOAMI));
 	sc->mbox = sc->pf;
 
 	memset(sc->chan_map, 0xff, sizeof(sc->chan_map));
 
 	/* Prepare the adapter for operation. */
 	buf = malloc(PAGE_SIZE, M_CXGBE, M_ZERO | M_WAITOK);
 	rc = -t4_prep_adapter(sc, buf);
 	free(buf, M_CXGBE);
 	if (rc != 0) {
 		device_printf(dev, "failed to prepare adapter: %d.\n", rc);
 		goto done;
 	}
 
 	/*
 	 * Do this really early, with the memory windows set up even before the
 	 * character device.  The userland tool's register i/o and mem read
 	 * will work even in "recovery mode".
 	 */
 	setup_memwin(sc);
 	if (t4_init_devlog_params(sc, 0) == 0)
 		fixup_devlog_params(sc);
 	make_dev_args_init(&mda);
 	mda.mda_devsw = &t4_cdevsw;
 	mda.mda_uid = UID_ROOT;
 	mda.mda_gid = GID_WHEEL;
 	mda.mda_mode = 0600;
 	mda.mda_si_drv1 = sc;
 	rc = make_dev_s(&mda, &sc->cdev, "%s", device_get_nameunit(dev));
 	if (rc != 0)
 		device_printf(dev, "failed to create nexus char device: %d.\n",
 		    rc);
 
 	/* Go no further if recovery mode has been requested. */
 	if (TUNABLE_INT_FETCH("hw.cxgbe.sos", &i) && i != 0) {
 		device_printf(dev, "recovery mode.\n");
 		goto done;
 	}
 
 #if defined(__i386__)
 	if ((cpu_feature & CPUID_CX8) == 0) {
 		device_printf(dev, "64 bit atomics not available.\n");
 		rc = ENOTSUP;
 		goto done;
 	}
 #endif
 
 	/* Prepare the firmware for operation */
 	rc = prep_firmware(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	rc = get_params__post_init(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	rc = set_params__post_init(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	rc = map_bar_2(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	rc = t4_create_dma_tag(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	/*
 	 * Number of VIs to create per-port.  The first VI is the "main" regular
 	 * VI for the port.  The rest are additional virtual interfaces on the
 	 * same physical port.  Note that the main VI does not have native
 	 * netmap support but the extra VIs do.
 	 *
 	 * Limit the number of VIs per port to the number of available
 	 * MAC addresses per port.
 	 */
 	if (t4_num_vis >= 1)
 		num_vis = t4_num_vis;
 	else
 		num_vis = 1;
 	if (num_vis > nitems(vi_mac_funcs)) {
 		num_vis = nitems(vi_mac_funcs);
 		device_printf(dev, "Number of VIs limited to %d\n", num_vis);
 	}
 
 	/*
 	 * First pass over all the ports - allocate VIs and initialize some
 	 * basic parameters like mac address, port type, etc.  We also figure
 	 * out whether a port is 10G or 1G and use that information when
 	 * calculating how many interrupts to attempt to allocate.
 	 */
 	n10g = n1g = 0;
 	for_each_port(sc, i) {
 		struct port_info *pi;
 
 		pi = malloc(sizeof(*pi), M_CXGBE, M_ZERO | M_WAITOK);
 		sc->port[i] = pi;
 
 		/* These must be set before t4_port_init */
 		pi->adapter = sc;
 		pi->port_id = i;
 		/*
 		 * XXX: vi[0] is special so we can't delay this allocation until
 		 * pi->nvi's final value is known.
 		 */
 		pi->vi = malloc(sizeof(struct vi_info) * num_vis, M_CXGBE,
 		    M_ZERO | M_WAITOK);
 
 		/*
 		 * Allocate the "main" VI and initialize parameters
 		 * like mac addr.
 		 */
 		rc = -t4_port_init(sc, sc->mbox, sc->pf, 0, i);
 		if (rc != 0) {
 			device_printf(dev, "unable to initialize port %d: %d\n",
 			    i, rc);
 			free(pi->vi, M_CXGBE);
 			free(pi, M_CXGBE);
 			sc->port[i] = NULL;
 			goto done;
 		}
 
 		pi->link_cfg.requested_fc &= ~(PAUSE_TX | PAUSE_RX);
 		pi->link_cfg.requested_fc |= t4_pause_settings;
 		pi->link_cfg.fc &= ~(PAUSE_TX | PAUSE_RX);
 		pi->link_cfg.fc |= t4_pause_settings;
 
 		rc = -t4_link_l1cfg(sc, sc->mbox, pi->tx_chan, &pi->link_cfg);
 		if (rc != 0) {
 			device_printf(dev, "port %d l1cfg failed: %d\n", i, rc);
 			free(pi->vi, M_CXGBE);
 			free(pi, M_CXGBE);
 			sc->port[i] = NULL;
 			goto done;
 		}
 
 		snprintf(pi->lockname, sizeof(pi->lockname), "%sp%d",
 		    device_get_nameunit(dev), i);
 		mtx_init(&pi->pi_lock, pi->lockname, 0, MTX_DEF);
 		sc->chan_map[pi->tx_chan] = i;
 
 		pi->tc = malloc(sizeof(struct tx_sched_class) *
 		    sc->chip_params->nsched_cls, M_CXGBE, M_ZERO | M_WAITOK);
 
 		if (is_10G_port(pi) || is_40G_port(pi)) {
 			n10g++;
 		} else {
 			n1g++;
 		}
 
 		pi->linkdnrc = -1;
 
 		pi->dev = device_add_child(dev, is_t4(sc) ? "cxgbe" : "cxl", -1);
 		if (pi->dev == NULL) {
 			device_printf(dev,
 			    "failed to add device for port %d.\n", i);
 			rc = ENXIO;
 			goto done;
 		}
 		pi->vi[0].dev = pi->dev;
 		device_set_softc(pi->dev, pi);
 	}
 
 	/*
 	 * Interrupt type, # of interrupts, # of rx/tx queues, etc.
 	 */
 	rc = cfg_itype_and_nqueues(sc, n10g, n1g, num_vis, &iaq);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 	if (iaq.nrxq_vi + iaq.nofldrxq_vi + iaq.nnmrxq_vi == 0)
 		num_vis = 1;
 
 	sc->intr_type = iaq.intr_type;
 	sc->intr_count = iaq.nirq;
 
 	s = &sc->sge;
 	s->nrxq = n10g * iaq.nrxq10g + n1g * iaq.nrxq1g;
 	s->ntxq = n10g * iaq.ntxq10g + n1g * iaq.ntxq1g;
 	if (num_vis > 1) {
 		s->nrxq += (n10g + n1g) * (num_vis - 1) * iaq.nrxq_vi;
 		s->ntxq += (n10g + n1g) * (num_vis - 1) * iaq.ntxq_vi;
 	}
 	s->neq = s->ntxq + s->nrxq;	/* the free list in an rxq is an eq */
 	s->neq += sc->params.nports + 1;/* ctrl queues: 1 per port + 1 mgmt */
 	s->niq = s->nrxq + 1;		/* 1 extra for firmware event queue */
 #ifdef TCP_OFFLOAD
 	if (is_offload(sc)) {
 		s->nofldrxq = n10g * iaq.nofldrxq10g + n1g * iaq.nofldrxq1g;
 		s->nofldtxq = n10g * iaq.nofldtxq10g + n1g * iaq.nofldtxq1g;
 		if (num_vis > 1) {
 			s->nofldrxq += (n10g + n1g) * (num_vis - 1) *
 			    iaq.nofldrxq_vi;
 			s->nofldtxq += (n10g + n1g) * (num_vis - 1) *
 			    iaq.nofldtxq_vi;
 		}
 		s->neq += s->nofldtxq + s->nofldrxq;
 		s->niq += s->nofldrxq;
 
 		s->ofld_rxq = malloc(s->nofldrxq * sizeof(struct sge_ofld_rxq),
 		    M_CXGBE, M_ZERO | M_WAITOK);
 		s->ofld_txq = malloc(s->nofldtxq * sizeof(struct sge_wrq),
 		    M_CXGBE, M_ZERO | M_WAITOK);
 	}
 #endif
 #ifdef DEV_NETMAP
 	if (num_vis > 1) {
 		s->nnmrxq = (n10g + n1g) * (num_vis - 1) * iaq.nnmrxq_vi;
 		s->nnmtxq = (n10g + n1g) * (num_vis - 1) * iaq.nnmtxq_vi;
 	}
 	s->neq += s->nnmtxq + s->nnmrxq;
 	s->niq += s->nnmrxq;
 
 	s->nm_rxq = malloc(s->nnmrxq * sizeof(struct sge_nm_rxq),
 	    M_CXGBE, M_ZERO | M_WAITOK);
 	s->nm_txq = malloc(s->nnmtxq * sizeof(struct sge_nm_txq),
 	    M_CXGBE, M_ZERO | M_WAITOK);
 #endif
 
 	s->ctrlq = malloc(sc->params.nports * sizeof(struct sge_wrq), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 	s->rxq = malloc(s->nrxq * sizeof(struct sge_rxq), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 	s->txq = malloc(s->ntxq * sizeof(struct sge_txq), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 	s->iqmap = malloc(s->niq * sizeof(struct sge_iq *), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 	s->eqmap = malloc(s->neq * sizeof(struct sge_eq *), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	sc->irq = malloc(sc->intr_count * sizeof(struct irq), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	t4_init_l2t(sc, M_WAITOK);
 
 	/*
 	 * Second pass over the ports.  This time we know the number of rx and
 	 * tx queues that each port should get.
 	 */
 	rqidx = tqidx = 0;
 #ifdef TCP_OFFLOAD
 	ofld_rqidx = ofld_tqidx = 0;
 #endif
 #ifdef DEV_NETMAP
 	nm_rqidx = nm_tqidx = 0;
 #endif
 	for_each_port(sc, i) {
 		struct port_info *pi = sc->port[i];
 		struct vi_info *vi;
 
 		if (pi == NULL)
 			continue;
 
 		pi->nvi = num_vis;
 		for_each_vi(pi, j, vi) {
 			vi->pi = pi;
 			vi->qsize_rxq = t4_qsize_rxq;
 			vi->qsize_txq = t4_qsize_txq;
 
 			vi->first_rxq = rqidx;
 			vi->first_txq = tqidx;
 			if (is_10G_port(pi) || is_40G_port(pi)) {
 				vi->tmr_idx = t4_tmr_idx_10g;
 				vi->pktc_idx = t4_pktc_idx_10g;
 				vi->flags |= iaq.intr_flags_10g & INTR_RXQ;
 				vi->nrxq = j == 0 ? iaq.nrxq10g : iaq.nrxq_vi;
 				vi->ntxq = j == 0 ? iaq.ntxq10g : iaq.ntxq_vi;
 			} else {
 				vi->tmr_idx = t4_tmr_idx_1g;
 				vi->pktc_idx = t4_pktc_idx_1g;
 				vi->flags |= iaq.intr_flags_1g & INTR_RXQ;
 				vi->nrxq = j == 0 ? iaq.nrxq1g : iaq.nrxq_vi;
 				vi->ntxq = j == 0 ? iaq.ntxq1g : iaq.ntxq_vi;
 			}
 			rqidx += vi->nrxq;
 			tqidx += vi->ntxq;
 
 			if (j == 0 && vi->ntxq > 1)
 				vi->rsrv_noflowq = iaq.rsrv_noflowq ? 1 : 0;
 			else
 				vi->rsrv_noflowq = 0;
 
 #ifdef TCP_OFFLOAD
 			vi->first_ofld_rxq = ofld_rqidx;
 			vi->first_ofld_txq = ofld_tqidx;
 			if (is_10G_port(pi) || is_40G_port(pi)) {
 				vi->flags |= iaq.intr_flags_10g & INTR_OFLD_RXQ;
 				vi->nofldrxq = j == 0 ? iaq.nofldrxq10g :
 				    iaq.nofldrxq_vi;
 				vi->nofldtxq = j == 0 ? iaq.nofldtxq10g :
 				    iaq.nofldtxq_vi;
 			} else {
 				vi->flags |= iaq.intr_flags_1g & INTR_OFLD_RXQ;
 				vi->nofldrxq = j == 0 ? iaq.nofldrxq1g :
 				    iaq.nofldrxq_vi;
 				vi->nofldtxq = j == 0 ? iaq.nofldtxq1g :
 				    iaq.nofldtxq_vi;
 			}
 			ofld_rqidx += vi->nofldrxq;
 			ofld_tqidx += vi->nofldtxq;
 #endif
 #ifdef DEV_NETMAP
 			if (j > 0) {
 				vi->first_nm_rxq = nm_rqidx;
 				vi->first_nm_txq = nm_tqidx;
 				vi->nnmrxq = iaq.nnmrxq_vi;
 				vi->nnmtxq = iaq.nnmtxq_vi;
 				nm_rqidx += vi->nnmrxq;
 				nm_tqidx += vi->nnmtxq;
 			}
 #endif
 		}
 	}
 
 	rc = setup_intr_handlers(sc);
 	if (rc != 0) {
 		device_printf(dev,
 		    "failed to setup interrupt handlers: %d\n", rc);
 		goto done;
 	}
 
 	rc = bus_generic_attach(dev);
 	if (rc != 0) {
 		device_printf(dev,
 		    "failed to attach all child ports: %d\n", rc);
 		goto done;
 	}
 
 	device_printf(dev,
 	    "PCIe gen%d x%d, %d ports, %d %s interrupt%s, %d eq, %d iq\n",
 	    sc->params.pci.speed, sc->params.pci.width, sc->params.nports,
 	    sc->intr_count, sc->intr_type == INTR_MSIX ? "MSI-X" :
 	    (sc->intr_type == INTR_MSI ? "MSI" : "INTx"),
 	    sc->intr_count > 1 ? "s" : "", sc->sge.neq, sc->sge.niq);
 
 	t4_set_desc(sc);
 
 	notify_siblings(dev, 0);
 
 done:
 	if (rc != 0 && sc->cdev) {
 		/* cdev was created and so cxgbetool works; recover that way. */
 		device_printf(dev,
 		    "error during attach, adapter is now in recovery mode.\n");
 		rc = 0;
 	}
 
 	if (rc != 0)
 		t4_detach(dev);
 	else
 		t4_sysctls(sc);
 
 	return (rc);
 }
 
 static int
 t4_ready(device_t dev)
 {
 	struct adapter *sc;
 
 	sc = device_get_softc(dev);
 	if (sc->flags & FW_OK)
 		return (0);
 	return (ENXIO);
 }
 
 static int
 t4_read_port_device(device_t dev, int port, device_t *child)
 {
 	struct adapter *sc;
 	struct port_info *pi;
 
 	sc = device_get_softc(dev);
 	if (port < 0 || port >= MAX_NPORTS)
 		return (EINVAL);
 	pi = sc->port[port];
 	if (pi == NULL || pi->dev == NULL)
 		return (ENXIO);
 	*child = pi->dev;
 	return (0);
 }
 
 static int
 notify_siblings(device_t dev, int detaching)
 {
 	device_t sibling;
 	int error, i;
 
 	error = 0;
 	for (i = 0; i < PCI_FUNCMAX; i++) {
 		if (i == pci_get_function(dev))
 			continue;
 		sibling = pci_find_dbsf(pci_get_domain(dev), pci_get_bus(dev),
 		    pci_get_slot(dev), i);
 		if (sibling == NULL || !device_is_attached(sibling))
 			continue;
 		if (detaching)
 			error = T4_DETACH_CHILD(sibling);
 		else
 			(void)T4_ATTACH_CHILD(sibling);
 		if (error)
 			break;
 	}
 	return (error);
 }
 
 /*
  * Idempotent
  */
 static int
 t4_detach(device_t dev)
 {
 	struct adapter *sc;
 	struct port_info *pi;
 	int i, rc;
 
 	sc = device_get_softc(dev);
 
 	rc = notify_siblings(dev, 1);
 	if (rc) {
 		device_printf(dev,
 		    "failed to detach sibling devices: %d\n", rc);
 		return (rc);
 	}
 
 	if (sc->flags & FULL_INIT_DONE)
 		t4_intr_disable(sc);
 
 	if (sc->cdev) {
 		destroy_dev(sc->cdev);
 		sc->cdev = NULL;
 	}
 
 	rc = bus_generic_detach(dev);
 	if (rc) {
 		device_printf(dev,
 		    "failed to detach child devices: %d\n", rc);
 		return (rc);
 	}
 
 	for (i = 0; i < sc->intr_count; i++)
 		t4_free_irq(sc, &sc->irq[i]);
 
 	for (i = 0; i < MAX_NPORTS; i++) {
 		pi = sc->port[i];
 		if (pi) {
 			t4_free_vi(sc, sc->mbox, sc->pf, 0, pi->vi[0].viid);
 			if (pi->dev)
 				device_delete_child(dev, pi->dev);
 
 			mtx_destroy(&pi->pi_lock);
 			free(pi->vi, M_CXGBE);
 			free(pi->tc, M_CXGBE);
 			free(pi, M_CXGBE);
 		}
 	}
 
 	if (sc->flags & FULL_INIT_DONE)
 		adapter_full_uninit(sc);
 
 	if (sc->flags & FW_OK)
 		t4_fw_bye(sc, sc->mbox);
 
 	if (sc->intr_type == INTR_MSI || sc->intr_type == INTR_MSIX)
 		pci_release_msi(dev);
 
 	if (sc->regs_res)
 		bus_release_resource(dev, SYS_RES_MEMORY, sc->regs_rid,
 		    sc->regs_res);
 
 	if (sc->udbs_res)
 		bus_release_resource(dev, SYS_RES_MEMORY, sc->udbs_rid,
 		    sc->udbs_res);
 
 	if (sc->msix_res)
 		bus_release_resource(dev, SYS_RES_MEMORY, sc->msix_rid,
 		    sc->msix_res);
 
 	if (sc->l2t)
 		t4_free_l2t(sc->l2t);
 
 #ifdef TCP_OFFLOAD
 	free(sc->sge.ofld_rxq, M_CXGBE);
 	free(sc->sge.ofld_txq, M_CXGBE);
 #endif
 #ifdef DEV_NETMAP
 	free(sc->sge.nm_rxq, M_CXGBE);
 	free(sc->sge.nm_txq, M_CXGBE);
 #endif
 	free(sc->irq, M_CXGBE);
 	free(sc->sge.rxq, M_CXGBE);
 	free(sc->sge.txq, M_CXGBE);
 	free(sc->sge.ctrlq, M_CXGBE);
 	free(sc->sge.iqmap, M_CXGBE);
 	free(sc->sge.eqmap, M_CXGBE);
 	free(sc->tids.ftid_tab, M_CXGBE);
 	t4_destroy_dma_tag(sc);
 	if (mtx_initialized(&sc->sc_lock)) {
 		sx_xlock(&t4_list_lock);
 		SLIST_REMOVE(&t4_list, sc, adapter, link);
 		sx_xunlock(&t4_list_lock);
 		mtx_destroy(&sc->sc_lock);
 	}
 
 	callout_drain(&sc->sfl_callout);
 	if (mtx_initialized(&sc->tids.ftid_lock))
 		mtx_destroy(&sc->tids.ftid_lock);
 	if (mtx_initialized(&sc->sfl_lock))
 		mtx_destroy(&sc->sfl_lock);
 	if (mtx_initialized(&sc->ifp_lock))
 		mtx_destroy(&sc->ifp_lock);
 	if (mtx_initialized(&sc->reg_lock))
 		mtx_destroy(&sc->reg_lock);
 
 	for (i = 0; i < NUM_MEMWIN; i++) {
 		struct memwin *mw = &sc->memwin[i];
 
 		if (rw_initialized(&mw->mw_lock))
 			rw_destroy(&mw->mw_lock);
 	}
 
 	bzero(sc, sizeof(*sc));
 
 	return (0);
 }
 
 static int
 cxgbe_probe(device_t dev)
 {
 	char buf[128];
 	struct port_info *pi = device_get_softc(dev);
 
 	snprintf(buf, sizeof(buf), "port %d", pi->port_id);
 	device_set_desc_copy(dev, buf);
 
 	return (BUS_PROBE_DEFAULT);
 }
 
 #define T4_CAP (IFCAP_VLAN_HWTAGGING | IFCAP_VLAN_MTU | IFCAP_HWCSUM | \
     IFCAP_VLAN_HWCSUM | IFCAP_TSO | IFCAP_JUMBO_MTU | IFCAP_LRO | \
     IFCAP_VLAN_HWTSO | IFCAP_LINKSTATE | IFCAP_HWCSUM_IPV6 | IFCAP_HWSTATS)
 #define T4_CAP_ENABLE (T4_CAP)
 
 static int
 cxgbe_vi_attach(device_t dev, struct vi_info *vi)
 {
 	struct ifnet *ifp;
 	struct sbuf *sb;
 
 	vi->xact_addr_filt = -1;
 	callout_init(&vi->tick, 1);
 
 	/* Allocate an ifnet and set it up */
 	ifp = if_alloc(IFT_ETHER);
 	if (ifp == NULL) {
 		device_printf(dev, "Cannot allocate ifnet\n");
 		return (ENOMEM);
 	}
 	vi->ifp = ifp;
 	ifp->if_softc = vi;
 
 	if_initname(ifp, device_get_name(dev), device_get_unit(dev));
 	ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
 
 	ifp->if_init = cxgbe_init;
 	ifp->if_ioctl = cxgbe_ioctl;
 	ifp->if_transmit = cxgbe_transmit;
 	ifp->if_qflush = cxgbe_qflush;
 	ifp->if_get_counter = cxgbe_get_counter;
 
 	ifp->if_capabilities = T4_CAP;
 #ifdef TCP_OFFLOAD
 	if (vi->nofldrxq != 0)
 		ifp->if_capabilities |= IFCAP_TOE;
 #endif
 	ifp->if_capenable = T4_CAP_ENABLE;
 	ifp->if_hwassist = CSUM_TCP | CSUM_UDP | CSUM_IP | CSUM_TSO |
 	    CSUM_UDP_IPV6 | CSUM_TCP_IPV6;
 
 	ifp->if_hw_tsomax = 65536 - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN);
 	ifp->if_hw_tsomaxsegcount = TX_SGL_SEGS;
 	ifp->if_hw_tsomaxsegsize = 65536;
 
 	/* Initialize ifmedia for this VI */
 	ifmedia_init(&vi->media, IFM_IMASK, cxgbe_media_change,
 	    cxgbe_media_status);
 	build_medialist(vi->pi, &vi->media);
 
 	vi->vlan_c = EVENTHANDLER_REGISTER(vlan_config, cxgbe_vlan_config, ifp,
 	    EVENTHANDLER_PRI_ANY);
 
 	ether_ifattach(ifp, vi->hw_addr);
 #ifdef DEV_NETMAP
 	if (vi->nnmrxq != 0)
 		cxgbe_nm_attach(vi);
 #endif
 	sb = sbuf_new_auto();
 	sbuf_printf(sb, "%d txq, %d rxq (NIC)", vi->ntxq, vi->nrxq);
 #ifdef TCP_OFFLOAD
 	if (ifp->if_capabilities & IFCAP_TOE)
 		sbuf_printf(sb, "; %d txq, %d rxq (TOE)",
 		    vi->nofldtxq, vi->nofldrxq);
 #endif
 #ifdef DEV_NETMAP
 	if (ifp->if_capabilities & IFCAP_NETMAP)
 		sbuf_printf(sb, "; %d txq, %d rxq (netmap)",
 		    vi->nnmtxq, vi->nnmrxq);
 #endif
 	sbuf_finish(sb);
 	device_printf(dev, "%s\n", sbuf_data(sb));
 	sbuf_delete(sb);
 
 	vi_sysctls(vi);
 
 	return (0);
 }
 
 static int
 cxgbe_attach(device_t dev)
 {
 	struct port_info *pi = device_get_softc(dev);
 	struct vi_info *vi;
 	int i, rc;
 
 	callout_init_mtx(&pi->tick, &pi->pi_lock, 0);
 
 	rc = cxgbe_vi_attach(dev, &pi->vi[0]);
 	if (rc)
 		return (rc);
 
 	for_each_vi(pi, i, vi) {
 		if (i == 0)
 			continue;
 		vi->dev = device_add_child(dev, is_t4(pi->adapter) ?
 		    "vcxgbe" : "vcxl", -1);
 		if (vi->dev == NULL) {
 			device_printf(dev, "failed to add VI %d\n", i);
 			continue;
 		}
 		device_set_softc(vi->dev, vi);
 	}
 
 	cxgbe_sysctls(pi);
 
 	bus_generic_attach(dev);
 
 	return (0);
 }
 
 static void
 cxgbe_vi_detach(struct vi_info *vi)
 {
 	struct ifnet *ifp = vi->ifp;
 
 	ether_ifdetach(ifp);
 
 	if (vi->vlan_c)
 		EVENTHANDLER_DEREGISTER(vlan_config, vi->vlan_c);
 
 	/* Let detach proceed even if these fail. */
 #ifdef DEV_NETMAP
 	if (ifp->if_capabilities & IFCAP_NETMAP)
 		cxgbe_nm_detach(vi);
 #endif
 	cxgbe_uninit_synchronized(vi);
 	callout_drain(&vi->tick);
 	vi_full_uninit(vi);
 
 	ifmedia_removeall(&vi->media);
 	if_free(vi->ifp);
 	vi->ifp = NULL;
 }
 
 static int
 cxgbe_detach(device_t dev)
 {
 	struct port_info *pi = device_get_softc(dev);
 	struct adapter *sc = pi->adapter;
 	int rc;
 
 	/* Detach the extra VIs first. */
 	rc = bus_generic_detach(dev);
 	if (rc)
 		return (rc);
 	device_delete_children(dev);
 
 	doom_vi(sc, &pi->vi[0]);
 
 	if (pi->flags & HAS_TRACEQ) {
 		sc->traceq = -1;	/* cloner should not create ifnet */
 		t4_tracer_port_detach(sc);
 	}
 
 	cxgbe_vi_detach(&pi->vi[0]);
 	callout_drain(&pi->tick);
 
 	end_synchronized_op(sc, 0);
 
 	return (0);
 }
 
 static void
 cxgbe_init(void *arg)
 {
 	struct vi_info *vi = arg;
 	struct adapter *sc = vi->pi->adapter;
 
 	if (begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4init") != 0)
 		return;
 	cxgbe_init_synchronized(vi);
 	end_synchronized_op(sc, 0);
 }
 
 static int
 cxgbe_ioctl(struct ifnet *ifp, unsigned long cmd, caddr_t data)
 {
 	int rc = 0, mtu, flags, can_sleep;
 	struct vi_info *vi = ifp->if_softc;
 	struct adapter *sc = vi->pi->adapter;
 	struct ifreq *ifr = (struct ifreq *)data;
 	uint32_t mask;
 
 	switch (cmd) {
 	case SIOCSIFMTU:
 		mtu = ifr->ifr_mtu;
 		if ((mtu < ETHERMIN) || (mtu > ETHERMTU_JUMBO))
 			return (EINVAL);
 
 		rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4mtu");
 		if (rc)
 			return (rc);
 		ifp->if_mtu = mtu;
 		if (vi->flags & VI_INIT_DONE) {
 			t4_update_fl_bufsize(ifp);
 			if (ifp->if_drv_flags & IFF_DRV_RUNNING)
 				rc = update_mac_settings(ifp, XGMAC_MTU);
 		}
 		end_synchronized_op(sc, 0);
 		break;
 
 	case SIOCSIFFLAGS:
 		can_sleep = 0;
 redo_sifflags:
 		rc = begin_synchronized_op(sc, vi,
 		    can_sleep ? (SLEEP_OK | INTR_OK) : HOLD_LOCK, "t4flg");
 		if (rc)
 			return (rc);
 
 		if (ifp->if_flags & IFF_UP) {
 			if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 				flags = vi->if_flags;
 				if ((ifp->if_flags ^ flags) &
 				    (IFF_PROMISC | IFF_ALLMULTI)) {
 					if (can_sleep == 1) {
 						end_synchronized_op(sc, 0);
 						can_sleep = 0;
 						goto redo_sifflags;
 					}
 					rc = update_mac_settings(ifp,
 					    XGMAC_PROMISC | XGMAC_ALLMULTI);
 				}
 			} else {
 				if (can_sleep == 0) {
 					end_synchronized_op(sc, LOCK_HELD);
 					can_sleep = 1;
 					goto redo_sifflags;
 				}
 				rc = cxgbe_init_synchronized(vi);
 			}
 			vi->if_flags = ifp->if_flags;
 		} else if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 			if (can_sleep == 0) {
 				end_synchronized_op(sc, LOCK_HELD);
 				can_sleep = 1;
 				goto redo_sifflags;
 			}
 			rc = cxgbe_uninit_synchronized(vi);
 		}
 		end_synchronized_op(sc, can_sleep ? 0 : LOCK_HELD);
 		break;
 
 	case SIOCADDMULTI:
 	case SIOCDELMULTI: /* these two are called with a mutex held :-( */
 		rc = begin_synchronized_op(sc, vi, HOLD_LOCK, "t4multi");
 		if (rc)
 			return (rc);
 		if (ifp->if_drv_flags & IFF_DRV_RUNNING)
 			rc = update_mac_settings(ifp, XGMAC_MCADDRS);
 		end_synchronized_op(sc, LOCK_HELD);
 		break;
 
 	case SIOCSIFCAP:
 		rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4cap");
 		if (rc)
 			return (rc);
 
 		mask = ifr->ifr_reqcap ^ ifp->if_capenable;
 		if (mask & IFCAP_TXCSUM) {
 			ifp->if_capenable ^= IFCAP_TXCSUM;
 			ifp->if_hwassist ^= (CSUM_TCP | CSUM_UDP | CSUM_IP);
 
 			if (IFCAP_TSO4 & ifp->if_capenable &&
 			    !(IFCAP_TXCSUM & ifp->if_capenable)) {
 				ifp->if_capenable &= ~IFCAP_TSO4;
 				if_printf(ifp,
 				    "tso4 disabled due to -txcsum.\n");
 			}
 		}
 		if (mask & IFCAP_TXCSUM_IPV6) {
 			ifp->if_capenable ^= IFCAP_TXCSUM_IPV6;
 			ifp->if_hwassist ^= (CSUM_UDP_IPV6 | CSUM_TCP_IPV6);
 
 			if (IFCAP_TSO6 & ifp->if_capenable &&
 			    !(IFCAP_TXCSUM_IPV6 & ifp->if_capenable)) {
 				ifp->if_capenable &= ~IFCAP_TSO6;
 				if_printf(ifp,
 				    "tso6 disabled due to -txcsum6.\n");
 			}
 		}
 		if (mask & IFCAP_RXCSUM)
 			ifp->if_capenable ^= IFCAP_RXCSUM;
 		if (mask & IFCAP_RXCSUM_IPV6)
 			ifp->if_capenable ^= IFCAP_RXCSUM_IPV6;
 
 		/*
 		 * Note that we leave CSUM_TSO alone (it is always set).  The
 		 * kernel takes both IFCAP_TSOx and CSUM_TSO into account before
 		 * sending a TSO request our way, so it's sufficient to toggle
 		 * IFCAP_TSOx only.
 		 */
 		if (mask & IFCAP_TSO4) {
 			if (!(IFCAP_TSO4 & ifp->if_capenable) &&
 			    !(IFCAP_TXCSUM & ifp->if_capenable)) {
 				if_printf(ifp, "enable txcsum first.\n");
 				rc = EAGAIN;
 				goto fail;
 			}
 			ifp->if_capenable ^= IFCAP_TSO4;
 		}
 		if (mask & IFCAP_TSO6) {
 			if (!(IFCAP_TSO6 & ifp->if_capenable) &&
 			    !(IFCAP_TXCSUM_IPV6 & ifp->if_capenable)) {
 				if_printf(ifp, "enable txcsum6 first.\n");
 				rc = EAGAIN;
 				goto fail;
 			}
 			ifp->if_capenable ^= IFCAP_TSO6;
 		}
 		if (mask & IFCAP_LRO) {
 #if defined(INET) || defined(INET6)
 			int i;
 			struct sge_rxq *rxq;
 
 			ifp->if_capenable ^= IFCAP_LRO;
 			for_each_rxq(vi, i, rxq) {
 				if (ifp->if_capenable & IFCAP_LRO)
 					rxq->iq.flags |= IQ_LRO_ENABLED;
 				else
 					rxq->iq.flags &= ~IQ_LRO_ENABLED;
 			}
 #endif
 		}
 #ifdef TCP_OFFLOAD
 		if (mask & IFCAP_TOE) {
 			int enable = (ifp->if_capenable ^ mask) & IFCAP_TOE;
 
 			rc = toe_capability(vi, enable);
 			if (rc != 0)
 				goto fail;
 
 			ifp->if_capenable ^= mask;
 		}
 #endif
 		if (mask & IFCAP_VLAN_HWTAGGING) {
 			ifp->if_capenable ^= IFCAP_VLAN_HWTAGGING;
 			if (ifp->if_drv_flags & IFF_DRV_RUNNING)
 				rc = update_mac_settings(ifp, XGMAC_VLANEX);
 		}
 		if (mask & IFCAP_VLAN_MTU) {
 			ifp->if_capenable ^= IFCAP_VLAN_MTU;
 
 			/* Need to find out how to disable auto-mtu-inflation */
 		}
 		if (mask & IFCAP_VLAN_HWTSO)
 			ifp->if_capenable ^= IFCAP_VLAN_HWTSO;
 		if (mask & IFCAP_VLAN_HWCSUM)
 			ifp->if_capenable ^= IFCAP_VLAN_HWCSUM;
 
 #ifdef VLAN_CAPABILITIES
 		VLAN_CAPABILITIES(ifp);
 #endif
 fail:
 		end_synchronized_op(sc, 0);
 		break;
 
 	case SIOCSIFMEDIA:
 	case SIOCGIFMEDIA:
 		ifmedia_ioctl(ifp, ifr, &vi->media, cmd);
 		break;
 
 	case SIOCGI2C: {
 		struct ifi2creq i2c;
 
 		rc = copyin(ifr->ifr_data, &i2c, sizeof(i2c));
 		if (rc != 0)
 			break;
 		if (i2c.dev_addr != 0xA0 && i2c.dev_addr != 0xA2) {
 			rc = EPERM;
 			break;
 		}
 		if (i2c.len > sizeof(i2c.data)) {
 			rc = EINVAL;
 			break;
 		}
 		rc = begin_synchronized_op(sc, vi, SLEEP_OK | INTR_OK, "t4i2c");
 		if (rc)
 			return (rc);
 		rc = -t4_i2c_rd(sc, sc->mbox, vi->pi->port_id, i2c.dev_addr,
 		    i2c.offset, i2c.len, &i2c.data[0]);
 		end_synchronized_op(sc, 0);
 		if (rc == 0)
 			rc = copyout(&i2c, ifr->ifr_data, sizeof(i2c));
 		break;
 	}
 
 	default:
 		rc = ether_ioctl(ifp, cmd, data);
 	}
 
 	return (rc);
 }
 
 static int
 cxgbe_transmit(struct ifnet *ifp, struct mbuf *m)
 {
 	struct vi_info *vi = ifp->if_softc;
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	struct sge_txq *txq;
 	void *items[1];
 	int rc;
 
 	M_ASSERTPKTHDR(m);
 	MPASS(m->m_nextpkt == NULL);	/* not quite ready for this yet */
 
 	if (__predict_false(pi->link_cfg.link_ok == 0)) {
 		m_freem(m);
 		return (ENETDOWN);
 	}
 
 	rc = parse_pkt(&m);
 	if (__predict_false(rc != 0)) {
 		MPASS(m == NULL);			/* was freed already */
 		atomic_add_int(&pi->tx_parse_error, 1);	/* rare, atomic is ok */
 		return (rc);
 	}
 
 	/* Select a txq. */
 	txq = &sc->sge.txq[vi->first_txq];
 	if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE)
 		txq += ((m->m_pkthdr.flowid % (vi->ntxq - vi->rsrv_noflowq)) +
 		    vi->rsrv_noflowq);
 
 	items[0] = m;
 	rc = mp_ring_enqueue(txq->r, items, 1, 4096);
 	if (__predict_false(rc != 0))
 		m_freem(m);
 
 	return (rc);
 }
 
 static void
 cxgbe_qflush(struct ifnet *ifp)
 {
 	struct vi_info *vi = ifp->if_softc;
 	struct sge_txq *txq;
 	int i;
 
 	/* queues do not exist if !VI_INIT_DONE. */
 	if (vi->flags & VI_INIT_DONE) {
 		for_each_txq(vi, i, txq) {
 			TXQ_LOCK(txq);
 			txq->eq.flags &= ~EQ_ENABLED;
 			TXQ_UNLOCK(txq);
 			while (!mp_ring_is_idle(txq->r)) {
 				mp_ring_check_drainage(txq->r, 0);
 				pause("qflush", 1);
 			}
 		}
 	}
 	if_qflush(ifp);
 }
 
 static uint64_t
 vi_get_counter(struct ifnet *ifp, ift_counter c)
 {
 	struct vi_info *vi = ifp->if_softc;
 	struct fw_vi_stats_vf *s = &vi->stats;
 
 	vi_refresh_stats(vi->pi->adapter, vi);
 
 	switch (c) {
 	case IFCOUNTER_IPACKETS:
 		return (s->rx_bcast_frames + s->rx_mcast_frames +
 		    s->rx_ucast_frames);
 	case IFCOUNTER_IERRORS:
 		return (s->rx_err_frames);
 	case IFCOUNTER_OPACKETS:
 		return (s->tx_bcast_frames + s->tx_mcast_frames +
 		    s->tx_ucast_frames + s->tx_offload_frames);
 	case IFCOUNTER_OERRORS:
 		return (s->tx_drop_frames);
 	case IFCOUNTER_IBYTES:
 		return (s->rx_bcast_bytes + s->rx_mcast_bytes +
 		    s->rx_ucast_bytes);
 	case IFCOUNTER_OBYTES:
 		return (s->tx_bcast_bytes + s->tx_mcast_bytes +
 		    s->tx_ucast_bytes + s->tx_offload_bytes);
 	case IFCOUNTER_IMCASTS:
 		return (s->rx_mcast_frames);
 	case IFCOUNTER_OMCASTS:
 		return (s->tx_mcast_frames);
 	case IFCOUNTER_OQDROPS: {
 		uint64_t drops;
 
 		drops = 0;
 		if (vi->flags & VI_INIT_DONE) {
 			int i;
 			struct sge_txq *txq;
 
 			for_each_txq(vi, i, txq)
 				drops += counter_u64_fetch(txq->r->drops);
 		}
 
 		return (drops);
 
 	}
 
 	default:
 		return (if_get_counter_default(ifp, c));
 	}
 }
 
 uint64_t
 cxgbe_get_counter(struct ifnet *ifp, ift_counter c)
 {
 	struct vi_info *vi = ifp->if_softc;
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	struct port_stats *s = &pi->stats;
 
 	if (pi->nvi > 1)
 		return (vi_get_counter(ifp, c));
 
 	cxgbe_refresh_stats(sc, pi);
 
 	switch (c) {
 	case IFCOUNTER_IPACKETS:
 		return (s->rx_frames);
 
 	case IFCOUNTER_IERRORS:
 		return (s->rx_jabber + s->rx_runt + s->rx_too_long +
 		    s->rx_fcs_err + s->rx_len_err);
 
 	case IFCOUNTER_OPACKETS:
 		return (s->tx_frames);
 
 	case IFCOUNTER_OERRORS:
 		return (s->tx_error_frames);
 
 	case IFCOUNTER_IBYTES:
 		return (s->rx_octets);
 
 	case IFCOUNTER_OBYTES:
 		return (s->tx_octets);
 
 	case IFCOUNTER_IMCASTS:
 		return (s->rx_mcast_frames);
 
 	case IFCOUNTER_OMCASTS:
 		return (s->tx_mcast_frames);
 
 	case IFCOUNTER_IQDROPS:
 		return (s->rx_ovflow0 + s->rx_ovflow1 + s->rx_ovflow2 +
 		    s->rx_ovflow3 + s->rx_trunc0 + s->rx_trunc1 + s->rx_trunc2 +
 		    s->rx_trunc3 + pi->tnl_cong_drops);
 
 	case IFCOUNTER_OQDROPS: {
 		uint64_t drops;
 
 		drops = s->tx_drop;
 		if (vi->flags & VI_INIT_DONE) {
 			int i;
 			struct sge_txq *txq;
 
 			for_each_txq(vi, i, txq)
 				drops += counter_u64_fetch(txq->r->drops);
 		}
 
 		return (drops);
 
 	}
 
 	default:
 		return (if_get_counter_default(ifp, c));
 	}
 }
 
 static int
 cxgbe_media_change(struct ifnet *ifp)
 {
 	struct vi_info *vi = ifp->if_softc;
 
 	device_printf(vi->dev, "%s unimplemented.\n", __func__);
 
 	return (EOPNOTSUPP);
 }
 
 static void
 cxgbe_media_status(struct ifnet *ifp, struct ifmediareq *ifmr)
 {
 	struct vi_info *vi = ifp->if_softc;
 	struct port_info *pi = vi->pi;
 	struct ifmedia_entry *cur;
 	int speed = pi->link_cfg.speed;
 
 	cur = vi->media.ifm_cur;
 
 	ifmr->ifm_status = IFM_AVALID;
 	if (!pi->link_cfg.link_ok)
 		return;
 
 	ifmr->ifm_status |= IFM_ACTIVE;
 
 	/* active and current will differ iff current media is autoselect. */
 	if (IFM_SUBTYPE(cur->ifm_media) != IFM_AUTO)
 		return;
 
 	ifmr->ifm_active = IFM_ETHER | IFM_FDX;
 	if (speed == 10000)
 		ifmr->ifm_active |= IFM_10G_T;
 	else if (speed == 1000)
 		ifmr->ifm_active |= IFM_1000_T;
 	else if (speed == 100)
 		ifmr->ifm_active |= IFM_100_TX;
 	else if (speed == 10)
 		ifmr->ifm_active |= IFM_10_T;
 	else
 		KASSERT(0, ("%s: link up but speed unknown (%u)", __func__,
 			    speed));
 }
 
 static int
 vcxgbe_probe(device_t dev)
 {
 	char buf[128];
 	struct vi_info *vi = device_get_softc(dev);
 
 	snprintf(buf, sizeof(buf), "port %d vi %td", vi->pi->port_id,
 	    vi - vi->pi->vi);
 	device_set_desc_copy(dev, buf);
 
 	return (BUS_PROBE_DEFAULT);
 }
 
 static int
 vcxgbe_attach(device_t dev)
 {
 	struct vi_info *vi;
 	struct port_info *pi;
 	struct adapter *sc;
 	int func, index, rc;
 	u32 param, val;
 
 	vi = device_get_softc(dev);
 	pi = vi->pi;
 	sc = pi->adapter;
 
 	index = vi - pi->vi;
 	KASSERT(index < nitems(vi_mac_funcs),
 	    ("%s: VI %s doesn't have a MAC func", __func__,
 	    device_get_nameunit(dev)));
 	func = vi_mac_funcs[index];
 	rc = t4_alloc_vi_func(sc, sc->mbox, pi->tx_chan, sc->pf, 0, 1,
 	    vi->hw_addr, &vi->rss_size, func, 0);
 	if (rc < 0) {
 		device_printf(dev, "Failed to allocate virtual interface "
 		    "for port %d: %d\n", pi->port_id, -rc);
 		return (-rc);
 	}
 	vi->viid = rc;
 
 	param = V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) |
 	    V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_RSSINFO) |
 	    V_FW_PARAMS_PARAM_YZ(vi->viid);
 	rc = t4_query_params(sc, sc->mbox, sc->pf, 0, 1, &param, &val);
 	if (rc)
 		vi->rss_base = 0xffff;
 	else {
 		/* MPASS((val >> 16) == rss_size); */
 		vi->rss_base = val & 0xffff;
 	}
 
 	rc = cxgbe_vi_attach(dev, vi);
 	if (rc) {
 		t4_free_vi(sc, sc->mbox, sc->pf, 0, vi->viid);
 		return (rc);
 	}
 	return (0);
 }
 
 static int
 vcxgbe_detach(device_t dev)
 {
 	struct vi_info *vi;
 	struct adapter *sc;
 
 	vi = device_get_softc(dev);
 	sc = vi->pi->adapter;
 
 	doom_vi(sc, vi);
 
 	cxgbe_vi_detach(vi);
 	t4_free_vi(sc, sc->mbox, sc->pf, 0, vi->viid);
 
 	end_synchronized_op(sc, 0);
 
 	return (0);
 }
 
 void
 t4_fatal_err(struct adapter *sc)
 {
 	t4_set_reg_field(sc, A_SGE_CONTROL, F_GLOBALENABLE, 0);
 	t4_intr_disable(sc);
 	log(LOG_EMERG, "%s: encountered fatal error, adapter stopped.\n",
 	    device_get_nameunit(sc->dev));
 }
 
 static int
 map_bars_0_and_4(struct adapter *sc)
 {
 	sc->regs_rid = PCIR_BAR(0);
 	sc->regs_res = bus_alloc_resource_any(sc->dev, SYS_RES_MEMORY,
 	    &sc->regs_rid, RF_ACTIVE);
 	if (sc->regs_res == NULL) {
 		device_printf(sc->dev, "cannot map registers.\n");
 		return (ENXIO);
 	}
 	sc->bt = rman_get_bustag(sc->regs_res);
 	sc->bh = rman_get_bushandle(sc->regs_res);
 	sc->mmio_len = rman_get_size(sc->regs_res);
 	setbit(&sc->doorbells, DOORBELL_KDB);
 
 	sc->msix_rid = PCIR_BAR(4);
 	sc->msix_res = bus_alloc_resource_any(sc->dev, SYS_RES_MEMORY,
 	    &sc->msix_rid, RF_ACTIVE);
 	if (sc->msix_res == NULL) {
 		device_printf(sc->dev, "cannot map MSI-X BAR.\n");
 		return (ENXIO);
 	}
 
 	return (0);
 }
 
 static int
 map_bar_2(struct adapter *sc)
 {
 
 	/*
 	 * T4: only iWARP driver uses the userspace doorbells.  There is no need
 	 * to map it if RDMA is disabled.
 	 */
 	if (is_t4(sc) && sc->rdmacaps == 0)
 		return (0);
 
 	sc->udbs_rid = PCIR_BAR(2);
 	sc->udbs_res = bus_alloc_resource_any(sc->dev, SYS_RES_MEMORY,
 	    &sc->udbs_rid, RF_ACTIVE);
 	if (sc->udbs_res == NULL) {
 		device_printf(sc->dev, "cannot map doorbell BAR.\n");
 		return (ENXIO);
 	}
 	sc->udbs_base = rman_get_virtual(sc->udbs_res);
 
 	if (is_t5(sc)) {
 		setbit(&sc->doorbells, DOORBELL_UDB);
 #if defined(__i386__) || defined(__amd64__)
 		if (t5_write_combine) {
 			int rc;
 
 			/*
 			 * Enable write combining on BAR2.  This is the
 			 * userspace doorbell BAR and is split into 128B
 			 * (UDBS_SEG_SIZE) doorbell regions, each associated
 			 * with an egress queue.  The first 64B has the doorbell
 			 * and the second 64B can be used to submit a tx work
 			 * request with an implicit doorbell.
 			 */
 
 			rc = pmap_change_attr((vm_offset_t)sc->udbs_base,
 			    rman_get_size(sc->udbs_res), PAT_WRITE_COMBINING);
 			if (rc == 0) {
 				clrbit(&sc->doorbells, DOORBELL_UDB);
 				setbit(&sc->doorbells, DOORBELL_WCWR);
 				setbit(&sc->doorbells, DOORBELL_UDBWC);
 			} else {
 				device_printf(sc->dev,
 				    "couldn't enable write combining: %d\n",
 				    rc);
 			}
 
 			t4_write_reg(sc, A_SGE_STAT_CFG,
 			    V_STATSOURCE_T5(7) | V_STATMODE(0));
 		}
 #endif
 	}
 
 	return (0);
 }
 
 struct memwin_init {
 	uint32_t base;
 	uint32_t aperture;
 };
 
 static const struct memwin_init t4_memwin[NUM_MEMWIN] = {
 	{ MEMWIN0_BASE, MEMWIN0_APERTURE },
 	{ MEMWIN1_BASE, MEMWIN1_APERTURE },
 	{ MEMWIN2_BASE_T4, MEMWIN2_APERTURE_T4 }
 };
 
 static const struct memwin_init t5_memwin[NUM_MEMWIN] = {
 	{ MEMWIN0_BASE, MEMWIN0_APERTURE },
 	{ MEMWIN1_BASE, MEMWIN1_APERTURE },
 	{ MEMWIN2_BASE_T5, MEMWIN2_APERTURE_T5 },
 };
 
 static void
 setup_memwin(struct adapter *sc)
 {
 	const struct memwin_init *mw_init;
 	struct memwin *mw;
 	int i;
 	uint32_t bar0;
 
 	if (is_t4(sc)) {
 		/*
 		 * Read low 32b of bar0 indirectly via the hardware backdoor
 		 * mechanism.  Works from within PCI passthrough environments
 		 * too, where rman_get_start() can return a different value.  We
 		 * need to program the T4 memory window decoders with the actual
 		 * addresses that will be coming across the PCIe link.
 		 */
 		bar0 = t4_hw_pci_read_cfg4(sc, PCIR_BAR(0));
 		bar0 &= (uint32_t) PCIM_BAR_MEM_BASE;
 
 		mw_init = &t4_memwin[0];
 	} else {
 		/* T5+ use the relative offset inside the PCIe BAR */
 		bar0 = 0;
 
 		mw_init = &t5_memwin[0];
 	}
 
 	for (i = 0, mw = &sc->memwin[0]; i < NUM_MEMWIN; i++, mw_init++, mw++) {
 		rw_init(&mw->mw_lock, "memory window access");
 		mw->mw_base = mw_init->base;
 		mw->mw_aperture = mw_init->aperture;
 		mw->mw_curpos = 0;
 		t4_write_reg(sc,
 		    PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_BASE_WIN, i),
 		    (mw->mw_base + bar0) | V_BIR(0) |
 		    V_WINDOW(ilog2(mw->mw_aperture) - 10));
 		rw_wlock(&mw->mw_lock);
 		position_memwin(sc, i, 0);
 		rw_wunlock(&mw->mw_lock);
 	}
 
 	/* flush */
 	t4_read_reg(sc, PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_BASE_WIN, 2));
 }
 
 /*
  * Positions the memory window at the given address in the card's address space.
  * There are some alignment requirements and the actual position may be at an
  * address prior to the requested address.  mw->mw_curpos always has the actual
  * position of the window.
  */
 static void
 position_memwin(struct adapter *sc, int idx, uint32_t addr)
 {
 	struct memwin *mw;
 	uint32_t pf;
 	uint32_t reg;
 
 	MPASS(idx >= 0 && idx < NUM_MEMWIN);
 	mw = &sc->memwin[idx];
 	rw_assert(&mw->mw_lock, RA_WLOCKED);
 
 	if (is_t4(sc)) {
 		pf = 0;
 		mw->mw_curpos = addr & ~0xf;	/* start must be 16B aligned */
 	} else {
 		pf = V_PFNUM(sc->pf);
 		mw->mw_curpos = addr & ~0x7f;	/* start must be 128B aligned */
 	}
 	reg = PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_OFFSET, idx);
 	t4_write_reg(sc, reg, mw->mw_curpos | pf);
 	t4_read_reg(sc, reg);	/* flush */
 }
 
 static int
 rw_via_memwin(struct adapter *sc, int idx, uint32_t addr, uint32_t *val,
     int len, int rw)
 {
 	struct memwin *mw;
 	uint32_t mw_end, v;
 
 	MPASS(idx >= 0 && idx < NUM_MEMWIN);
 
 	/* Memory can only be accessed in naturally aligned 4 byte units */
 	if (addr & 3 || len & 3 || len <= 0)
 		return (EINVAL);
 
 	mw = &sc->memwin[idx];
 	while (len > 0) {
 		rw_rlock(&mw->mw_lock);
 		mw_end = mw->mw_curpos + mw->mw_aperture;
 		if (addr >= mw_end || addr < mw->mw_curpos) {
 			/* Will need to reposition the window */
 			if (!rw_try_upgrade(&mw->mw_lock)) {
 				rw_runlock(&mw->mw_lock);
 				rw_wlock(&mw->mw_lock);
 			}
 			rw_assert(&mw->mw_lock, RA_WLOCKED);
 			position_memwin(sc, idx, addr);
 			rw_downgrade(&mw->mw_lock);
 			mw_end = mw->mw_curpos + mw->mw_aperture;
 		}
 		rw_assert(&mw->mw_lock, RA_RLOCKED);
 		while (addr < mw_end && len > 0) {
 			if (rw == 0) {
 				v = t4_read_reg(sc, mw->mw_base + addr -
 				    mw->mw_curpos);
 				*val++ = le32toh(v);
 			} else {
 				v = *val++;
 				t4_write_reg(sc, mw->mw_base + addr -
 				    mw->mw_curpos, htole32(v));
 			}
 			addr += 4;
 			len -= 4;
 		}
 		rw_runlock(&mw->mw_lock);
 	}
 
 	return (0);
 }
 
 static inline int
 read_via_memwin(struct adapter *sc, int idx, uint32_t addr, uint32_t *val,
     int len)
 {
 
 	return (rw_via_memwin(sc, idx, addr, val, len, 0));
 }
 
 static inline int
 write_via_memwin(struct adapter *sc, int idx, uint32_t addr,
     const uint32_t *val, int len)
 {
 
 	return (rw_via_memwin(sc, idx, addr, (void *)(uintptr_t)val, len, 1));
 }
 
 static int
 t4_range_cmp(const void *a, const void *b)
 {
 	return ((const struct t4_range *)a)->start -
 	       ((const struct t4_range *)b)->start;
 }
 
 /*
  * Verify that the memory range specified by the addr/len pair is valid within
  * the card's address space.
  */
 static int
 validate_mem_range(struct adapter *sc, uint32_t addr, int len)
 {
 	struct t4_range mem_ranges[4], *r, *next;
 	uint32_t em, addr_len;
 	int i, n, remaining;
 
 	/* Memory can only be accessed in naturally aligned 4 byte units */
 	if (addr & 3 || len & 3 || len <= 0)
 		return (EINVAL);
 
 	/* Enabled memories */
 	em = t4_read_reg(sc, A_MA_TARGET_MEM_ENABLE);
 
 	r = &mem_ranges[0];
 	n = 0;
 	bzero(r, sizeof(mem_ranges));
 	if (em & F_EDRAM0_ENABLE) {
 		addr_len = t4_read_reg(sc, A_MA_EDRAM0_BAR);
 		r->size = G_EDRAM0_SIZE(addr_len) << 20;
 		if (r->size > 0) {
 			r->start = G_EDRAM0_BASE(addr_len) << 20;
 			if (addr >= r->start &&
 			    addr + len <= r->start + r->size)
 				return (0);
 			r++;
 			n++;
 		}
 	}
 	if (em & F_EDRAM1_ENABLE) {
 		addr_len = t4_read_reg(sc, A_MA_EDRAM1_BAR);
 		r->size = G_EDRAM1_SIZE(addr_len) << 20;
 		if (r->size > 0) {
 			r->start = G_EDRAM1_BASE(addr_len) << 20;
 			if (addr >= r->start &&
 			    addr + len <= r->start + r->size)
 				return (0);
 			r++;
 			n++;
 		}
 	}
 	if (em & F_EXT_MEM_ENABLE) {
 		addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY_BAR);
 		r->size = G_EXT_MEM_SIZE(addr_len) << 20;
 		if (r->size > 0) {
 			r->start = G_EXT_MEM_BASE(addr_len) << 20;
 			if (addr >= r->start &&
 			    addr + len <= r->start + r->size)
 				return (0);
 			r++;
 			n++;
 		}
 	}
 	if (is_t5(sc) && em & F_EXT_MEM1_ENABLE) {
 		addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY1_BAR);
 		r->size = G_EXT_MEM1_SIZE(addr_len) << 20;
 		if (r->size > 0) {
 			r->start = G_EXT_MEM1_BASE(addr_len) << 20;
 			if (addr >= r->start &&
 			    addr + len <= r->start + r->size)
 				return (0);
 			r++;
 			n++;
 		}
 	}
 	MPASS(n <= nitems(mem_ranges));
 
 	if (n > 1) {
 		/* Sort and merge the ranges. */
 		qsort(mem_ranges, n, sizeof(struct t4_range), t4_range_cmp);
 
 		/* Start from index 0 and examine the next n - 1 entries. */
 		r = &mem_ranges[0];
 		for (remaining = n - 1; remaining > 0; remaining--, r++) {
 
 			MPASS(r->size > 0);	/* r is a valid entry. */
 			next = r + 1;
 			MPASS(next->size > 0);	/* and so is the next one. */
 
 			while (r->start + r->size >= next->start) {
 				/* Merge the next one into the current entry. */
 				r->size = max(r->start + r->size,
 				    next->start + next->size) - r->start;
 				n--;	/* One fewer entry in total. */
 				if (--remaining == 0)
 					goto done;	/* short circuit */
 				next++;
 			}
 			if (next != r + 1) {
 				/*
 				 * Some entries were merged into r and next
 				 * points to the first valid entry that couldn't
 				 * be merged.
 				 */
 				MPASS(next->size > 0);	/* must be valid */
 				memcpy(r + 1, next, remaining * sizeof(*r));
 #ifdef INVARIANTS
 				/*
 				 * This so that the foo->size assertion in the
 				 * next iteration of the loop do the right
 				 * thing for entries that were pulled up and are
 				 * no longer valid.
 				 */
 				MPASS(n < nitems(mem_ranges));
 				bzero(&mem_ranges[n], (nitems(mem_ranges) - n) *
 				    sizeof(struct t4_range));
 #endif
 			}
 		}
 done:
 		/* Done merging the ranges. */
 		MPASS(n > 0);
 		r = &mem_ranges[0];
 		for (i = 0; i < n; i++, r++) {
 			if (addr >= r->start &&
 			    addr + len <= r->start + r->size)
 				return (0);
 		}
 	}
 
 	return (EFAULT);
 }
 
 static int
 fwmtype_to_hwmtype(int mtype)
 {
 
 	switch (mtype) {
 	case FW_MEMTYPE_EDC0:
 		return (MEM_EDC0);
 	case FW_MEMTYPE_EDC1:
 		return (MEM_EDC1);
 	case FW_MEMTYPE_EXTMEM:
 		return (MEM_MC0);
 	case FW_MEMTYPE_EXTMEM1:
 		return (MEM_MC1);
 	default:
 		panic("%s: cannot translate fw mtype %d.", __func__, mtype);
 	}
 }
 
 /*
  * Verify that the memory range specified by the memtype/offset/len pair is
  * valid and lies entirely within the memtype specified.  The global address of
  * the start of the range is returned in addr.
  */
 static int
 validate_mt_off_len(struct adapter *sc, int mtype, uint32_t off, int len,
     uint32_t *addr)
 {
 	uint32_t em, addr_len, maddr;
 
 	/* Memory can only be accessed in naturally aligned 4 byte units */
 	if (off & 3 || len & 3 || len == 0)
 		return (EINVAL);
 
 	em = t4_read_reg(sc, A_MA_TARGET_MEM_ENABLE);
 	switch (fwmtype_to_hwmtype(mtype)) {
 	case MEM_EDC0:
 		if (!(em & F_EDRAM0_ENABLE))
 			return (EINVAL);
 		addr_len = t4_read_reg(sc, A_MA_EDRAM0_BAR);
 		maddr = G_EDRAM0_BASE(addr_len) << 20;
 		break;
 	case MEM_EDC1:
 		if (!(em & F_EDRAM1_ENABLE))
 			return (EINVAL);
 		addr_len = t4_read_reg(sc, A_MA_EDRAM1_BAR);
 		maddr = G_EDRAM1_BASE(addr_len) << 20;
 		break;
 	case MEM_MC:
 		if (!(em & F_EXT_MEM_ENABLE))
 			return (EINVAL);
 		addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY_BAR);
 		maddr = G_EXT_MEM_BASE(addr_len) << 20;
 		break;
 	case MEM_MC1:
 		if (!is_t5(sc) || !(em & F_EXT_MEM1_ENABLE))
 			return (EINVAL);
 		addr_len = t4_read_reg(sc, A_MA_EXT_MEMORY1_BAR);
 		maddr = G_EXT_MEM1_BASE(addr_len) << 20;
 		break;
 	default:
 		return (EINVAL);
 	}
 
 	*addr = maddr + off;	/* global address */
 	return (validate_mem_range(sc, *addr, len));
 }
 
 static int
 fixup_devlog_params(struct adapter *sc)
 {
 	struct devlog_params *dparams = &sc->params.devlog;
 	int rc;
 
 	rc = validate_mt_off_len(sc, dparams->memtype, dparams->start,
 	    dparams->size, &dparams->addr);
 
 	return (rc);
 }
 
 static int
 cfg_itype_and_nqueues(struct adapter *sc, int n10g, int n1g, int num_vis,
     struct intrs_and_queues *iaq)
 {
 	int rc, itype, navail, nrxq10g, nrxq1g, n;
 	int nofldrxq10g = 0, nofldrxq1g = 0;
 
 	bzero(iaq, sizeof(*iaq));
 
 	iaq->ntxq10g = t4_ntxq10g;
 	iaq->ntxq1g = t4_ntxq1g;
 	iaq->ntxq_vi = t4_ntxq_vi;
 	iaq->nrxq10g = nrxq10g = t4_nrxq10g;
 	iaq->nrxq1g = nrxq1g = t4_nrxq1g;
 	iaq->nrxq_vi = t4_nrxq_vi;
 	iaq->rsrv_noflowq = t4_rsrv_noflowq;
 #ifdef TCP_OFFLOAD
 	if (is_offload(sc)) {
 		iaq->nofldtxq10g = t4_nofldtxq10g;
 		iaq->nofldtxq1g = t4_nofldtxq1g;
 		iaq->nofldtxq_vi = t4_nofldtxq_vi;
 		iaq->nofldrxq10g = nofldrxq10g = t4_nofldrxq10g;
 		iaq->nofldrxq1g = nofldrxq1g = t4_nofldrxq1g;
 		iaq->nofldrxq_vi = t4_nofldrxq_vi;
 	}
 #endif
 #ifdef DEV_NETMAP
 	iaq->nnmtxq_vi = t4_nnmtxq_vi;
 	iaq->nnmrxq_vi = t4_nnmrxq_vi;
 #endif
 
 	for (itype = INTR_MSIX; itype; itype >>= 1) {
 
 		if ((itype & t4_intr_types) == 0)
 			continue;	/* not allowed */
 
 		if (itype == INTR_MSIX)
 			navail = pci_msix_count(sc->dev);
 		else if (itype == INTR_MSI)
 			navail = pci_msi_count(sc->dev);
 		else
 			navail = 1;
 restart:
 		if (navail == 0)
 			continue;
 
 		iaq->intr_type = itype;
 		iaq->intr_flags_10g = 0;
 		iaq->intr_flags_1g = 0;
 
 		/*
 		 * Best option: an interrupt vector for errors, one for the
 		 * firmware event queue, and one for every rxq (NIC and TOE) of
 		 * every VI.  The VIs that support netmap use the same
 		 * interrupts for the NIC rx queues and the netmap rx queues
 		 * because only one set of queues is active at a time.
 		 */
 		iaq->nirq = T4_EXTRA_INTR;
 		iaq->nirq += n10g * (nrxq10g + nofldrxq10g);
 		iaq->nirq += n1g * (nrxq1g + nofldrxq1g);
 		iaq->nirq += (n10g + n1g) * (num_vis - 1) *
 		    max(iaq->nrxq_vi, iaq->nnmrxq_vi);	/* See comment above. */
 		iaq->nirq += (n10g + n1g) * (num_vis - 1) * iaq->nofldrxq_vi;
 		if (iaq->nirq <= navail &&
 		    (itype != INTR_MSI || powerof2(iaq->nirq))) {
 			iaq->intr_flags_10g = INTR_ALL;
 			iaq->intr_flags_1g = INTR_ALL;
 			goto allocate;
 		}
 
 		/* Disable the VIs (and netmap) if there aren't enough intrs */
 		if (num_vis > 1) {
 			device_printf(sc->dev, "virtual interfaces disabled "
 			    "because num_vis=%u with current settings "
 			    "(nrxq10g=%u, nrxq1g=%u, nofldrxq10g=%u, "
 			    "nofldrxq1g=%u, nrxq_vi=%u nofldrxq_vi=%u, "
 			    "nnmrxq_vi=%u) would need %u interrupts but "
 			    "only %u are available.\n", num_vis, nrxq10g,
 			    nrxq1g, nofldrxq10g, nofldrxq1g, iaq->nrxq_vi,
 			    iaq->nofldrxq_vi, iaq->nnmrxq_vi, iaq->nirq,
 			    navail);
 			num_vis = 1;
 			iaq->ntxq_vi = iaq->nrxq_vi = 0;
 			iaq->nofldtxq_vi = iaq->nofldrxq_vi = 0;
 			iaq->nnmtxq_vi = iaq->nnmrxq_vi = 0;
 			goto restart;
 		}
 
 		/*
 		 * Second best option: a vector for errors, one for the firmware
 		 * event queue, and vectors for either all the NIC rx queues or
 		 * all the TOE rx queues.  The queues that don't get vectors
 		 * will forward their interrupts to those that do.
 		 */
 		iaq->nirq = T4_EXTRA_INTR;
 		if (nrxq10g >= nofldrxq10g) {
 			iaq->intr_flags_10g = INTR_RXQ;
 			iaq->nirq += n10g * nrxq10g;
 		} else {
 			iaq->intr_flags_10g = INTR_OFLD_RXQ;
 			iaq->nirq += n10g * nofldrxq10g;
 		}
 		if (nrxq1g >= nofldrxq1g) {
 			iaq->intr_flags_1g = INTR_RXQ;
 			iaq->nirq += n1g * nrxq1g;
 		} else {
 			iaq->intr_flags_1g = INTR_OFLD_RXQ;
 			iaq->nirq += n1g * nofldrxq1g;
 		}
 		if (iaq->nirq <= navail &&
 		    (itype != INTR_MSI || powerof2(iaq->nirq)))
 			goto allocate;
 
 		/*
 		 * Next best option: an interrupt vector for errors, one for the
 		 * firmware event queue, and at least one per main-VI.  At this
 		 * point we know we'll have to downsize nrxq and/or nofldrxq to
 		 * fit what's available to us.
 		 */
 		iaq->nirq = T4_EXTRA_INTR;
 		iaq->nirq += n10g + n1g;
 		if (iaq->nirq <= navail) {
 			int leftover = navail - iaq->nirq;
 
 			if (n10g > 0) {
 				int target = max(nrxq10g, nofldrxq10g);
 
 				iaq->intr_flags_10g = nrxq10g >= nofldrxq10g ?
 				    INTR_RXQ : INTR_OFLD_RXQ;
 
 				n = 1;
 				while (n < target && leftover >= n10g) {
 					leftover -= n10g;
 					iaq->nirq += n10g;
 					n++;
 				}
 				iaq->nrxq10g = min(n, nrxq10g);
 #ifdef TCP_OFFLOAD
 				iaq->nofldrxq10g = min(n, nofldrxq10g);
 #endif
 			}
 
 			if (n1g > 0) {
 				int target = max(nrxq1g, nofldrxq1g);
 
 				iaq->intr_flags_1g = nrxq1g >= nofldrxq1g ?
 				    INTR_RXQ : INTR_OFLD_RXQ;
 
 				n = 1;
 				while (n < target && leftover >= n1g) {
 					leftover -= n1g;
 					iaq->nirq += n1g;
 					n++;
 				}
 				iaq->nrxq1g = min(n, nrxq1g);
 #ifdef TCP_OFFLOAD
 				iaq->nofldrxq1g = min(n, nofldrxq1g);
 #endif
 			}
 
 			if (itype != INTR_MSI || powerof2(iaq->nirq))
 				goto allocate;
 		}
 
 		/*
 		 * Least desirable option: one interrupt vector for everything.
 		 */
 		iaq->nirq = iaq->nrxq10g = iaq->nrxq1g = 1;
 		iaq->intr_flags_10g = iaq->intr_flags_1g = 0;
 #ifdef TCP_OFFLOAD
 		if (is_offload(sc))
 			iaq->nofldrxq10g = iaq->nofldrxq1g = 1;
 #endif
 allocate:
 		navail = iaq->nirq;
 		rc = 0;
 		if (itype == INTR_MSIX)
 			rc = pci_alloc_msix(sc->dev, &navail);
 		else if (itype == INTR_MSI)
 			rc = pci_alloc_msi(sc->dev, &navail);
 
 		if (rc == 0) {
 			if (navail == iaq->nirq)
 				return (0);
 
 			/*
 			 * Didn't get the number requested.  Use whatever number
 			 * the kernel is willing to allocate (it's in navail).
 			 */
 			device_printf(sc->dev, "fewer vectors than requested, "
 			    "type=%d, req=%d, rcvd=%d; will downshift req.\n",
 			    itype, iaq->nirq, navail);
 			pci_release_msi(sc->dev);
 			goto restart;
 		}
 
 		device_printf(sc->dev,
 		    "failed to allocate vectors:%d, type=%d, req=%d, rcvd=%d\n",
 		    itype, rc, iaq->nirq, navail);
 	}
 
 	device_printf(sc->dev,
 	    "failed to find a usable interrupt type.  "
 	    "allowed=%d, msi-x=%d, msi=%d, intx=1", t4_intr_types,
 	    pci_msix_count(sc->dev), pci_msi_count(sc->dev));
 
 	return (ENXIO);
 }
 
 #define FW_VERSION(chip) ( \
     V_FW_HDR_FW_VER_MAJOR(chip##FW_VERSION_MAJOR) | \
     V_FW_HDR_FW_VER_MINOR(chip##FW_VERSION_MINOR) | \
     V_FW_HDR_FW_VER_MICRO(chip##FW_VERSION_MICRO) | \
     V_FW_HDR_FW_VER_BUILD(chip##FW_VERSION_BUILD))
 #define FW_INTFVER(chip, intf) (chip##FW_HDR_INTFVER_##intf)
 
 struct fw_info {
 	uint8_t chip;
 	char *kld_name;
 	char *fw_mod_name;
 	struct fw_hdr fw_hdr;	/* XXX: waste of space, need a sparse struct */
 } fw_info[] = {
 	{
 		.chip = CHELSIO_T4,
 		.kld_name = "t4fw_cfg",
 		.fw_mod_name = "t4fw",
 		.fw_hdr = {
 			.chip = FW_HDR_CHIP_T4,
 			.fw_ver = htobe32_const(FW_VERSION(T4)),
 			.intfver_nic = FW_INTFVER(T4, NIC),
 			.intfver_vnic = FW_INTFVER(T4, VNIC),
 			.intfver_ofld = FW_INTFVER(T4, OFLD),
 			.intfver_ri = FW_INTFVER(T4, RI),
 			.intfver_iscsipdu = FW_INTFVER(T4, ISCSIPDU),
 			.intfver_iscsi = FW_INTFVER(T4, ISCSI),
 			.intfver_fcoepdu = FW_INTFVER(T4, FCOEPDU),
 			.intfver_fcoe = FW_INTFVER(T4, FCOE),
 		},
 	}, {
 		.chip = CHELSIO_T5,
 		.kld_name = "t5fw_cfg",
 		.fw_mod_name = "t5fw",
 		.fw_hdr = {
 			.chip = FW_HDR_CHIP_T5,
 			.fw_ver = htobe32_const(FW_VERSION(T5)),
 			.intfver_nic = FW_INTFVER(T5, NIC),
 			.intfver_vnic = FW_INTFVER(T5, VNIC),
 			.intfver_ofld = FW_INTFVER(T5, OFLD),
 			.intfver_ri = FW_INTFVER(T5, RI),
 			.intfver_iscsipdu = FW_INTFVER(T5, ISCSIPDU),
 			.intfver_iscsi = FW_INTFVER(T5, ISCSI),
 			.intfver_fcoepdu = FW_INTFVER(T5, FCOEPDU),
 			.intfver_fcoe = FW_INTFVER(T5, FCOE),
 		},
 	}
 };
 
 static struct fw_info *
 find_fw_info(int chip)
 {
 	int i;
 
 	for (i = 0; i < nitems(fw_info); i++) {
 		if (fw_info[i].chip == chip)
 			return (&fw_info[i]);
 	}
 	return (NULL);
 }
 
 /*
  * Is the given firmware API compatible with the one the driver was compiled
  * with?
  */
 static int
 fw_compatible(const struct fw_hdr *hdr1, const struct fw_hdr *hdr2)
 {
 
 	/* short circuit if it's the exact same firmware version */
 	if (hdr1->chip == hdr2->chip && hdr1->fw_ver == hdr2->fw_ver)
 		return (1);
 
 	/*
 	 * XXX: Is this too conservative?  Perhaps I should limit this to the
 	 * features that are supported in the driver.
 	 */
 #define SAME_INTF(x) (hdr1->intfver_##x == hdr2->intfver_##x)
 	if (hdr1->chip == hdr2->chip && SAME_INTF(nic) && SAME_INTF(vnic) &&
 	    SAME_INTF(ofld) && SAME_INTF(ri) && SAME_INTF(iscsipdu) &&
 	    SAME_INTF(iscsi) && SAME_INTF(fcoepdu) && SAME_INTF(fcoe))
 		return (1);
 #undef SAME_INTF
 
 	return (0);
 }
 
 /*
  * The firmware in the KLD is usable, but should it be installed?  This routine
  * explains itself in detail if it indicates the KLD firmware should be
  * installed.
  */
 static int
 should_install_kld_fw(struct adapter *sc, int card_fw_usable, int k, int c)
 {
 	const char *reason;
 
 	if (!card_fw_usable) {
 		reason = "incompatible or unusable";
 		goto install;
 	}
 
 	if (k > c) {
 		reason = "older than the version bundled with this driver";
 		goto install;
 	}
 
 	if (t4_fw_install == 2 && k != c) {
 		reason = "different than the version bundled with this driver";
 		goto install;
 	}
 
 	return (0);
 
 install:
 	if (t4_fw_install == 0) {
 		device_printf(sc->dev, "firmware on card (%u.%u.%u.%u) is %s, "
 		    "but the driver is prohibited from installing a different "
 		    "firmware on the card.\n",
 		    G_FW_HDR_FW_VER_MAJOR(c), G_FW_HDR_FW_VER_MINOR(c),
 		    G_FW_HDR_FW_VER_MICRO(c), G_FW_HDR_FW_VER_BUILD(c), reason);
 
 		return (0);
 	}
 
 	device_printf(sc->dev, "firmware on card (%u.%u.%u.%u) is %s, "
 	    "installing firmware %u.%u.%u.%u on card.\n",
 	    G_FW_HDR_FW_VER_MAJOR(c), G_FW_HDR_FW_VER_MINOR(c),
 	    G_FW_HDR_FW_VER_MICRO(c), G_FW_HDR_FW_VER_BUILD(c), reason,
 	    G_FW_HDR_FW_VER_MAJOR(k), G_FW_HDR_FW_VER_MINOR(k),
 	    G_FW_HDR_FW_VER_MICRO(k), G_FW_HDR_FW_VER_BUILD(k));
 
 	return (1);
 }
 /*
  * Establish contact with the firmware and determine if we are the master driver
  * or not, and whether we are responsible for chip initialization.
  */
 static int
 prep_firmware(struct adapter *sc)
 {
 	const struct firmware *fw = NULL, *default_cfg;
 	int rc, pf, card_fw_usable, kld_fw_usable, need_fw_reset = 1;
 	enum dev_state state;
 	struct fw_info *fw_info;
 	struct fw_hdr *card_fw;		/* fw on the card */
 	const struct fw_hdr *kld_fw;	/* fw in the KLD */
 	const struct fw_hdr *drv_fw;	/* fw header the driver was compiled
 					   against */
 
 	/* Contact firmware. */
 	rc = t4_fw_hello(sc, sc->mbox, sc->mbox, MASTER_MAY, &state);
 	if (rc < 0 || state == DEV_STATE_ERR) {
 		rc = -rc;
 		device_printf(sc->dev,
 		    "failed to connect to the firmware: %d, %d.\n", rc, state);
 		return (rc);
 	}
 	pf = rc;
 	if (pf == sc->mbox)
 		sc->flags |= MASTER_PF;
 	else if (state == DEV_STATE_UNINIT) {
 		/*
 		 * We didn't get to be the master so we definitely won't be
 		 * configuring the chip.  It's a bug if someone else hasn't
 		 * configured it already.
 		 */
 		device_printf(sc->dev, "couldn't be master(%d), "
 		    "device not already initialized either(%d).\n", rc, state);
 		return (EDOOFUS);
 	}
 
 	/* This is the firmware whose headers the driver was compiled against */
 	fw_info = find_fw_info(chip_id(sc));
 	if (fw_info == NULL) {
 		device_printf(sc->dev,
 		    "unable to look up firmware information for chip %d.\n",
 		    chip_id(sc));
 		return (EINVAL);
 	}
 	drv_fw = &fw_info->fw_hdr;
 
 	/*
 	 * The firmware KLD contains many modules.  The KLD name is also the
 	 * name of the module that contains the default config file.
 	 */
 	default_cfg = firmware_get(fw_info->kld_name);
 
 	/* Read the header of the firmware on the card */
 	card_fw = malloc(sizeof(*card_fw), M_CXGBE, M_ZERO | M_WAITOK);
 	rc = -t4_read_flash(sc, FLASH_FW_START,
 	    sizeof (*card_fw) / sizeof (uint32_t), (uint32_t *)card_fw, 1);
 	if (rc == 0)
 		card_fw_usable = fw_compatible(drv_fw, (const void*)card_fw);
 	else {
 		device_printf(sc->dev,
 		    "Unable to read card's firmware header: %d\n", rc);
 		card_fw_usable = 0;
 	}
 
 	/* This is the firmware in the KLD */
 	fw = firmware_get(fw_info->fw_mod_name);
 	if (fw != NULL) {
 		kld_fw = (const void *)fw->data;
 		kld_fw_usable = fw_compatible(drv_fw, kld_fw);
 	} else {
 		kld_fw = NULL;
 		kld_fw_usable = 0;
 	}
 
 	if (card_fw_usable && card_fw->fw_ver == drv_fw->fw_ver &&
 	    (!kld_fw_usable || kld_fw->fw_ver == drv_fw->fw_ver)) {
 		/*
 		 * Common case: the firmware on the card is an exact match and
 		 * the KLD is an exact match too, or the KLD is
 		 * absent/incompatible.  Note that t4_fw_install = 2 is ignored
 		 * here -- use cxgbetool loadfw if you want to reinstall the
 		 * same firmware as the one on the card.
 		 */
 	} else if (kld_fw_usable && state == DEV_STATE_UNINIT &&
 	    should_install_kld_fw(sc, card_fw_usable, be32toh(kld_fw->fw_ver),
 	    be32toh(card_fw->fw_ver))) {
 
 		rc = -t4_fw_upgrade(sc, sc->mbox, fw->data, fw->datasize, 0);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to install firmware: %d\n", rc);
 			goto done;
 		}
 
 		/* Installed successfully, update the cached header too. */
 		memcpy(card_fw, kld_fw, sizeof(*card_fw));
 		card_fw_usable = 1;
 		need_fw_reset = 0;	/* already reset as part of load_fw */
 	}
 
 	if (!card_fw_usable) {
 		uint32_t d, c, k;
 
 		d = ntohl(drv_fw->fw_ver);
 		c = ntohl(card_fw->fw_ver);
 		k = kld_fw ? ntohl(kld_fw->fw_ver) : 0;
 
 		device_printf(sc->dev, "Cannot find a usable firmware: "
 		    "fw_install %d, chip state %d, "
 		    "driver compiled with %d.%d.%d.%d, "
 		    "card has %d.%d.%d.%d, KLD has %d.%d.%d.%d\n",
 		    t4_fw_install, state,
 		    G_FW_HDR_FW_VER_MAJOR(d), G_FW_HDR_FW_VER_MINOR(d),
 		    G_FW_HDR_FW_VER_MICRO(d), G_FW_HDR_FW_VER_BUILD(d),
 		    G_FW_HDR_FW_VER_MAJOR(c), G_FW_HDR_FW_VER_MINOR(c),
 		    G_FW_HDR_FW_VER_MICRO(c), G_FW_HDR_FW_VER_BUILD(c),
 		    G_FW_HDR_FW_VER_MAJOR(k), G_FW_HDR_FW_VER_MINOR(k),
 		    G_FW_HDR_FW_VER_MICRO(k), G_FW_HDR_FW_VER_BUILD(k));
 		rc = EINVAL;
 		goto done;
 	}
 
 	/* We're using whatever's on the card and it's known to be good. */
 	sc->params.fw_vers = ntohl(card_fw->fw_ver);
 	snprintf(sc->fw_version, sizeof(sc->fw_version), "%u.%u.%u.%u",
 	    G_FW_HDR_FW_VER_MAJOR(sc->params.fw_vers),
 	    G_FW_HDR_FW_VER_MINOR(sc->params.fw_vers),
 	    G_FW_HDR_FW_VER_MICRO(sc->params.fw_vers),
 	    G_FW_HDR_FW_VER_BUILD(sc->params.fw_vers));
 
 	t4_get_tp_version(sc, &sc->params.tp_vers);
 	snprintf(sc->tp_version, sizeof(sc->tp_version), "%u.%u.%u.%u",
 	    G_FW_HDR_FW_VER_MAJOR(sc->params.tp_vers),
 	    G_FW_HDR_FW_VER_MINOR(sc->params.tp_vers),
 	    G_FW_HDR_FW_VER_MICRO(sc->params.tp_vers),
 	    G_FW_HDR_FW_VER_BUILD(sc->params.tp_vers));
 
 	if (t4_get_exprom_version(sc, &sc->params.exprom_vers) != 0)
 		sc->params.exprom_vers = 0;
 	else {
 		snprintf(sc->exprom_version, sizeof(sc->exprom_version),
 		    "%u.%u.%u.%u",
 		    G_FW_HDR_FW_VER_MAJOR(sc->params.exprom_vers),
 		    G_FW_HDR_FW_VER_MINOR(sc->params.exprom_vers),
 		    G_FW_HDR_FW_VER_MICRO(sc->params.exprom_vers),
 		    G_FW_HDR_FW_VER_BUILD(sc->params.exprom_vers));
 	}
 
 	/* Reset device */
 	if (need_fw_reset &&
 	    (rc = -t4_fw_reset(sc, sc->mbox, F_PIORSTMODE | F_PIORST)) != 0) {
 		device_printf(sc->dev, "firmware reset failed: %d.\n", rc);
 		if (rc != ETIMEDOUT && rc != EIO)
 			t4_fw_bye(sc, sc->mbox);
 		goto done;
 	}
 	sc->flags |= FW_OK;
 
 	rc = get_params__pre_init(sc);
 	if (rc != 0)
 		goto done; /* error message displayed already */
 
 	/* Partition adapter resources as specified in the config file. */
 	if (state == DEV_STATE_UNINIT) {
 
 		KASSERT(sc->flags & MASTER_PF,
 		    ("%s: trying to change chip settings when not master.",
 		    __func__));
 
 		rc = partition_resources(sc, default_cfg, fw_info->kld_name);
 		if (rc != 0)
 			goto done;	/* error message displayed already */
 
 		t4_tweak_chip_settings(sc);
 
 		/* get basic stuff going */
 		rc = -t4_fw_initialize(sc, sc->mbox);
 		if (rc != 0) {
 			device_printf(sc->dev, "fw init failed: %d.\n", rc);
 			goto done;
 		}
 	} else {
 		snprintf(sc->cfg_file, sizeof(sc->cfg_file), "pf%d", pf);
 		sc->cfcsum = 0;
 	}
 
 done:
 	free(card_fw, M_CXGBE);
 	if (fw != NULL)
 		firmware_put(fw, FIRMWARE_UNLOAD);
 	if (default_cfg != NULL)
 		firmware_put(default_cfg, FIRMWARE_UNLOAD);
 
 	return (rc);
 }
 
 #define FW_PARAM_DEV(param) \
 	(V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) | \
 	 V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_##param))
 #define FW_PARAM_PFVF(param) \
 	(V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_PFVF) | \
 	 V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_PFVF_##param))
 
 /*
  * Partition chip resources for use between various PFs, VFs, etc.
  */
 static int
 partition_resources(struct adapter *sc, const struct firmware *default_cfg,
     const char *name_prefix)
 {
 	const struct firmware *cfg = NULL;
 	int rc = 0;
 	struct fw_caps_config_cmd caps;
 	uint32_t mtype, moff, finicsum, cfcsum;
 
 	/*
 	 * Figure out what configuration file to use.  Pick the default config
 	 * file for the card if the user hasn't specified one explicitly.
 	 */
 	snprintf(sc->cfg_file, sizeof(sc->cfg_file), "%s", t4_cfg_file);
 	if (strncmp(t4_cfg_file, DEFAULT_CF, sizeof(t4_cfg_file)) == 0) {
 		/* Card specific overrides go here. */
 		if (pci_get_device(sc->dev) == 0x440a)
 			snprintf(sc->cfg_file, sizeof(sc->cfg_file), UWIRE_CF);
 		if (is_fpga(sc))
 			snprintf(sc->cfg_file, sizeof(sc->cfg_file), FPGA_CF);
 	}
 
 	/*
 	 * We need to load another module if the profile is anything except
 	 * "default" or "flash".
 	 */
 	if (strncmp(sc->cfg_file, DEFAULT_CF, sizeof(sc->cfg_file)) != 0 &&
 	    strncmp(sc->cfg_file, FLASH_CF, sizeof(sc->cfg_file)) != 0) {
 		char s[32];
 
 		snprintf(s, sizeof(s), "%s_%s", name_prefix, sc->cfg_file);
 		cfg = firmware_get(s);
 		if (cfg == NULL) {
 			if (default_cfg != NULL) {
 				device_printf(sc->dev,
 				    "unable to load module \"%s\" for "
 				    "configuration profile \"%s\", will use "
 				    "the default config file instead.\n",
 				    s, sc->cfg_file);
 				snprintf(sc->cfg_file, sizeof(sc->cfg_file),
 				    "%s", DEFAULT_CF);
 			} else {
 				device_printf(sc->dev,
 				    "unable to load module \"%s\" for "
 				    "configuration profile \"%s\", will use "
 				    "the config file on the card's flash "
 				    "instead.\n", s, sc->cfg_file);
 				snprintf(sc->cfg_file, sizeof(sc->cfg_file),
 				    "%s", FLASH_CF);
 			}
 		}
 	}
 
 	if (strncmp(sc->cfg_file, DEFAULT_CF, sizeof(sc->cfg_file)) == 0 &&
 	    default_cfg == NULL) {
 		device_printf(sc->dev,
 		    "default config file not available, will use the config "
 		    "file on the card's flash instead.\n");
 		snprintf(sc->cfg_file, sizeof(sc->cfg_file), "%s", FLASH_CF);
 	}
 
 	if (strncmp(sc->cfg_file, FLASH_CF, sizeof(sc->cfg_file)) != 0) {
 		u_int cflen;
 		const uint32_t *cfdata;
 		uint32_t param, val, addr;
 
 		KASSERT(cfg != NULL || default_cfg != NULL,
 		    ("%s: no config to upload", __func__));
 
 		/*
 		 * Ask the firmware where it wants us to upload the config file.
 		 */
 		param = FW_PARAM_DEV(CF);
 		rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 1, &param, &val);
 		if (rc != 0) {
 			/* No support for config file?  Shouldn't happen. */
 			device_printf(sc->dev,
 			    "failed to query config file location: %d.\n", rc);
 			goto done;
 		}
 		mtype = G_FW_PARAMS_PARAM_Y(val);
 		moff = G_FW_PARAMS_PARAM_Z(val) << 16;
 
 		/*
 		 * XXX: sheer laziness.  We deliberately added 4 bytes of
 		 * useless stuffing/comments at the end of the config file so
 		 * it's ok to simply throw away the last remaining bytes when
 		 * the config file is not an exact multiple of 4.  This also
 		 * helps with the validate_mt_off_len check.
 		 */
 		if (cfg != NULL) {
 			cflen = cfg->datasize & ~3;
 			cfdata = cfg->data;
 		} else {
 			cflen = default_cfg->datasize & ~3;
 			cfdata = default_cfg->data;
 		}
 
 		if (cflen > FLASH_CFG_MAX_SIZE) {
 			device_printf(sc->dev,
 			    "config file too long (%d, max allowed is %d).  "
 			    "Will try to use the config on the card, if any.\n",
 			    cflen, FLASH_CFG_MAX_SIZE);
 			goto use_config_on_flash;
 		}
 
 		rc = validate_mt_off_len(sc, mtype, moff, cflen, &addr);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "%s: addr (%d/0x%x) or len %d is not valid: %d.  "
 			    "Will try to use the config on the card, if any.\n",
 			    __func__, mtype, moff, cflen, rc);
 			goto use_config_on_flash;
 		}
 		write_via_memwin(sc, 2, addr, cfdata, cflen);
 	} else {
 use_config_on_flash:
 		mtype = FW_MEMTYPE_FLASH;
 		moff = t4_flash_cfg_addr(sc);
 	}
 
 	bzero(&caps, sizeof(caps));
 	caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) |
 	    F_FW_CMD_REQUEST | F_FW_CMD_READ);
 	caps.cfvalid_to_len16 = htobe32(F_FW_CAPS_CONFIG_CMD_CFVALID |
 	    V_FW_CAPS_CONFIG_CMD_MEMTYPE_CF(mtype) |
 	    V_FW_CAPS_CONFIG_CMD_MEMADDR64K_CF(moff >> 16) | FW_LEN16(caps));
 	rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), &caps);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to pre-process config file: %d "
 		    "(mtype %d, moff 0x%x).\n", rc, mtype, moff);
 		goto done;
 	}
 
 	finicsum = be32toh(caps.finicsum);
 	cfcsum = be32toh(caps.cfcsum);
 	if (finicsum != cfcsum) {
 		device_printf(sc->dev,
 		    "WARNING: config file checksum mismatch: %08x %08x\n",
 		    finicsum, cfcsum);
 	}
 	sc->cfcsum = cfcsum;
 
 #define LIMIT_CAPS(x) do { \
 	caps.x &= htobe16(t4_##x##_allowed); \
 } while (0)
 
 	/*
 	 * Let the firmware know what features will (not) be used so it can tune
 	 * things accordingly.
 	 */
 	LIMIT_CAPS(nbmcaps);
 	LIMIT_CAPS(linkcaps);
 	LIMIT_CAPS(switchcaps);
 	LIMIT_CAPS(niccaps);
 	LIMIT_CAPS(toecaps);
 	LIMIT_CAPS(rdmacaps);
 	LIMIT_CAPS(tlscaps);
 	LIMIT_CAPS(iscsicaps);
 	LIMIT_CAPS(fcoecaps);
 #undef LIMIT_CAPS
 
 	caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) |
 	    F_FW_CMD_REQUEST | F_FW_CMD_WRITE);
 	caps.cfvalid_to_len16 = htobe32(FW_LEN16(caps));
 	rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), NULL);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to process config file: %d.\n", rc);
 	}
 done:
 	if (cfg != NULL)
 		firmware_put(cfg, FIRMWARE_UNLOAD);
 	return (rc);
 }
 
 /*
  * Retrieve parameters that are needed (or nice to have) very early.
  */
 static int
 get_params__pre_init(struct adapter *sc)
 {
 	int rc;
 	uint32_t param[2], val[2];
 
 	param[0] = FW_PARAM_DEV(PORTVEC);
 	param[1] = FW_PARAM_DEV(CCLK);
 	rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 2, param, val);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to query parameters (pre_init): %d.\n", rc);
 		return (rc);
 	}
 
 	sc->params.portvec = val[0];
 	sc->params.nports = bitcount32(val[0]);
 	sc->params.vpd.cclk = val[1];
 
 	/* Read device log parameters. */
 	rc = -t4_init_devlog_params(sc, 1);
 	if (rc == 0)
 		fixup_devlog_params(sc);
 	else {
 		device_printf(sc->dev,
 		    "failed to get devlog parameters: %d.\n", rc);
 		rc = 0;	/* devlog isn't critical for device operation */
 	}
 
 	return (rc);
 }
 
 /*
  * Retrieve various parameters that are of interest to the driver.  The device
  * has been initialized by the firmware at this point.
  */
 static int
 get_params__post_init(struct adapter *sc)
 {
 	int rc;
 	uint32_t param[7], val[7];
 	struct fw_caps_config_cmd caps;
 
 	param[0] = FW_PARAM_PFVF(IQFLINT_START);
 	param[1] = FW_PARAM_PFVF(EQ_START);
 	param[2] = FW_PARAM_PFVF(FILTER_START);
 	param[3] = FW_PARAM_PFVF(FILTER_END);
 	param[4] = FW_PARAM_PFVF(L2T_START);
 	param[5] = FW_PARAM_PFVF(L2T_END);
 	rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to query parameters (post_init): %d.\n", rc);
 		return (rc);
 	}
 
 	sc->sge.iq_start = val[0];
 	sc->sge.eq_start = val[1];
 	sc->tids.ftid_base = val[2];
 	sc->tids.nftids = val[3] - val[2] + 1;
 	sc->params.ftid_min = val[2];
 	sc->params.ftid_max = val[3];
 	sc->vres.l2t.start = val[4];
 	sc->vres.l2t.size = val[5] - val[4] + 1;
 	KASSERT(sc->vres.l2t.size <= L2T_SIZE,
 	    ("%s: L2 table size (%u) larger than expected (%u)",
 	    __func__, sc->vres.l2t.size, L2T_SIZE));
 
 	/* get capabilites */
 	bzero(&caps, sizeof(caps));
 	caps.op_to_write = htobe32(V_FW_CMD_OP(FW_CAPS_CONFIG_CMD) |
 	    F_FW_CMD_REQUEST | F_FW_CMD_READ);
 	caps.cfvalid_to_len16 = htobe32(FW_LEN16(caps));
 	rc = -t4_wr_mbox(sc, sc->mbox, &caps, sizeof(caps), &caps);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to get card capabilities: %d.\n", rc);
 		return (rc);
 	}
 
 #define READ_CAPS(x) do { \
 	sc->x = htobe16(caps.x); \
 } while (0)
 	READ_CAPS(nbmcaps);
 	READ_CAPS(linkcaps);
 	READ_CAPS(switchcaps);
 	READ_CAPS(niccaps);
 	READ_CAPS(toecaps);
 	READ_CAPS(rdmacaps);
 	READ_CAPS(tlscaps);
 	READ_CAPS(iscsicaps);
 	READ_CAPS(fcoecaps);
 
 	if (sc->niccaps & FW_CAPS_CONFIG_NIC_ETHOFLD) {
 		param[0] = FW_PARAM_PFVF(ETHOFLD_START);
 		param[1] = FW_PARAM_PFVF(ETHOFLD_END);
 		param[2] = FW_PARAM_DEV(FLOWC_BUFFIFO_SZ);
 		rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 3, param, val);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to query NIC parameters: %d.\n", rc);
 			return (rc);
 		}
 		sc->tids.etid_base = val[0];
 		sc->params.etid_min = val[0];
 		sc->tids.netids = val[1] - val[0] + 1;
 		sc->params.netids = sc->tids.netids;
 		sc->params.eo_wr_cred = val[2];
 		sc->params.ethoffload = 1;
 	}
 
 	if (sc->toecaps) {
 		/* query offload-related parameters */
 		param[0] = FW_PARAM_DEV(NTID);
 		param[1] = FW_PARAM_PFVF(SERVER_START);
 		param[2] = FW_PARAM_PFVF(SERVER_END);
 		param[3] = FW_PARAM_PFVF(TDDP_START);
 		param[4] = FW_PARAM_PFVF(TDDP_END);
 		param[5] = FW_PARAM_DEV(FLOWC_BUFFIFO_SZ);
 		rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to query TOE parameters: %d.\n", rc);
 			return (rc);
 		}
 		sc->tids.ntids = val[0];
 		sc->tids.natids = min(sc->tids.ntids / 2, MAX_ATIDS);
 		sc->tids.stid_base = val[1];
 		sc->tids.nstids = val[2] - val[1] + 1;
 		sc->vres.ddp.start = val[3];
 		sc->vres.ddp.size = val[4] - val[3] + 1;
 		sc->params.ofldq_wr_cred = val[5];
 		sc->params.offload = 1;
 	}
 	if (sc->rdmacaps) {
 		param[0] = FW_PARAM_PFVF(STAG_START);
 		param[1] = FW_PARAM_PFVF(STAG_END);
 		param[2] = FW_PARAM_PFVF(RQ_START);
 		param[3] = FW_PARAM_PFVF(RQ_END);
 		param[4] = FW_PARAM_PFVF(PBL_START);
 		param[5] = FW_PARAM_PFVF(PBL_END);
 		rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to query RDMA parameters(1): %d.\n", rc);
 			return (rc);
 		}
 		sc->vres.stag.start = val[0];
 		sc->vres.stag.size = val[1] - val[0] + 1;
 		sc->vres.rq.start = val[2];
 		sc->vres.rq.size = val[3] - val[2] + 1;
 		sc->vres.pbl.start = val[4];
 		sc->vres.pbl.size = val[5] - val[4] + 1;
 
 		param[0] = FW_PARAM_PFVF(SQRQ_START);
 		param[1] = FW_PARAM_PFVF(SQRQ_END);
 		param[2] = FW_PARAM_PFVF(CQ_START);
 		param[3] = FW_PARAM_PFVF(CQ_END);
 		param[4] = FW_PARAM_PFVF(OCQ_START);
 		param[5] = FW_PARAM_PFVF(OCQ_END);
 		rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 6, param, val);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to query RDMA parameters(2): %d.\n", rc);
 			return (rc);
 		}
 		sc->vres.qp.start = val[0];
 		sc->vres.qp.size = val[1] - val[0] + 1;
 		sc->vres.cq.start = val[2];
 		sc->vres.cq.size = val[3] - val[2] + 1;
 		sc->vres.ocq.start = val[4];
 		sc->vres.ocq.size = val[5] - val[4] + 1;
 	}
 	if (sc->iscsicaps) {
 		param[0] = FW_PARAM_PFVF(ISCSI_START);
 		param[1] = FW_PARAM_PFVF(ISCSI_END);
 		rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 2, param, val);
 		if (rc != 0) {
 			device_printf(sc->dev,
 			    "failed to query iSCSI parameters: %d.\n", rc);
 			return (rc);
 		}
 		sc->vres.iscsi.start = val[0];
 		sc->vres.iscsi.size = val[1] - val[0] + 1;
 	}
 
 	/*
 	 * We've got the params we wanted to query via the firmware.  Now grab
 	 * some others directly from the chip.
 	 */
 	rc = t4_read_chip_settings(sc);
 
 	return (rc);
 }
 
 static int
 set_params__post_init(struct adapter *sc)
 {
 	uint32_t param, val;
 
 	/* ask for encapsulated CPLs */
 	param = FW_PARAM_PFVF(CPLFW4MSG_ENCAP);
 	val = 1;
 	(void)t4_set_params(sc, sc->mbox, sc->pf, 0, 1, &param, &val);
 
 	return (0);
 }
 
 #undef FW_PARAM_PFVF
 #undef FW_PARAM_DEV
 
 static void
 t4_set_desc(struct adapter *sc)
 {
 	char buf[128];
 	struct adapter_params *p = &sc->params;
 
 	snprintf(buf, sizeof(buf), "Chelsio %s %sNIC (rev %d), S/N:%s, "
 	    "P/N:%s, E/C:%s", p->vpd.id, is_offload(sc) ? "R" : "",
 	    chip_rev(sc), p->vpd.sn, p->vpd.pn, p->vpd.ec);
 
 	device_set_desc_copy(sc->dev, buf);
 }
 
 static void
 build_medialist(struct port_info *pi, struct ifmedia *media)
 {
 	int m;
 
 	PORT_LOCK(pi);
 
 	ifmedia_removeall(media);
 
 	m = IFM_ETHER | IFM_FDX;
 
 	switch(pi->port_type) {
 	case FW_PORT_TYPE_BT_XFI:
 	case FW_PORT_TYPE_BT_XAUI:
 		ifmedia_add(media, m | IFM_10G_T, 0, NULL);
 		/* fall through */
 
 	case FW_PORT_TYPE_BT_SGMII:
 		ifmedia_add(media, m | IFM_1000_T, 0, NULL);
 		ifmedia_add(media, m | IFM_100_TX, 0, NULL);
 		ifmedia_add(media, IFM_ETHER | IFM_AUTO, 0, NULL);
 		ifmedia_set(media, IFM_ETHER | IFM_AUTO);
 		break;
 
 	case FW_PORT_TYPE_CX4:
 		ifmedia_add(media, m | IFM_10G_CX4, 0, NULL);
 		ifmedia_set(media, m | IFM_10G_CX4);
 		break;
 
 	case FW_PORT_TYPE_QSFP_10G:
 	case FW_PORT_TYPE_SFP:
 	case FW_PORT_TYPE_FIBER_XFI:
 	case FW_PORT_TYPE_FIBER_XAUI:
 		switch (pi->mod_type) {
 
 		case FW_PORT_MOD_TYPE_LR:
 			ifmedia_add(media, m | IFM_10G_LR, 0, NULL);
 			ifmedia_set(media, m | IFM_10G_LR);
 			break;
 
 		case FW_PORT_MOD_TYPE_SR:
 			ifmedia_add(media, m | IFM_10G_SR, 0, NULL);
 			ifmedia_set(media, m | IFM_10G_SR);
 			break;
 
 		case FW_PORT_MOD_TYPE_LRM:
 			ifmedia_add(media, m | IFM_10G_LRM, 0, NULL);
 			ifmedia_set(media, m | IFM_10G_LRM);
 			break;
 
 		case FW_PORT_MOD_TYPE_TWINAX_PASSIVE:
 		case FW_PORT_MOD_TYPE_TWINAX_ACTIVE:
 			ifmedia_add(media, m | IFM_10G_TWINAX, 0, NULL);
 			ifmedia_set(media, m | IFM_10G_TWINAX);
 			break;
 
 		case FW_PORT_MOD_TYPE_NONE:
 			m &= ~IFM_FDX;
 			ifmedia_add(media, m | IFM_NONE, 0, NULL);
 			ifmedia_set(media, m | IFM_NONE);
 			break;
 
 		case FW_PORT_MOD_TYPE_NA:
 		case FW_PORT_MOD_TYPE_ER:
 		default:
 			device_printf(pi->dev,
 			    "unknown port_type (%d), mod_type (%d)\n",
 			    pi->port_type, pi->mod_type);
 			ifmedia_add(media, m | IFM_UNKNOWN, 0, NULL);
 			ifmedia_set(media, m | IFM_UNKNOWN);
 			break;
 		}
 		break;
 
 	case FW_PORT_TYPE_QSFP:
 		switch (pi->mod_type) {
 
 		case FW_PORT_MOD_TYPE_LR:
 			ifmedia_add(media, m | IFM_40G_LR4, 0, NULL);
 			ifmedia_set(media, m | IFM_40G_LR4);
 			break;
 
 		case FW_PORT_MOD_TYPE_SR:
 			ifmedia_add(media, m | IFM_40G_SR4, 0, NULL);
 			ifmedia_set(media, m | IFM_40G_SR4);
 			break;
 
 		case FW_PORT_MOD_TYPE_TWINAX_PASSIVE:
 		case FW_PORT_MOD_TYPE_TWINAX_ACTIVE:
 			ifmedia_add(media, m | IFM_40G_CR4, 0, NULL);
 			ifmedia_set(media, m | IFM_40G_CR4);
 			break;
 
 		case FW_PORT_MOD_TYPE_NONE:
 			m &= ~IFM_FDX;
 			ifmedia_add(media, m | IFM_NONE, 0, NULL);
 			ifmedia_set(media, m | IFM_NONE);
 			break;
 
 		default:
 			device_printf(pi->dev,
 			    "unknown port_type (%d), mod_type (%d)\n",
 			    pi->port_type, pi->mod_type);
 			ifmedia_add(media, m | IFM_UNKNOWN, 0, NULL);
 			ifmedia_set(media, m | IFM_UNKNOWN);
 			break;
 		}
 		break;
 
 	default:
 		device_printf(pi->dev,
 		    "unknown port_type (%d), mod_type (%d)\n", pi->port_type,
 		    pi->mod_type);
 		ifmedia_add(media, m | IFM_UNKNOWN, 0, NULL);
 		ifmedia_set(media, m | IFM_UNKNOWN);
 		break;
 	}
 
 	PORT_UNLOCK(pi);
 }
 
 #define FW_MAC_EXACT_CHUNK	7
 
 /*
  * Program the port's XGMAC based on parameters in ifnet.  The caller also
  * indicates which parameters should be programmed (the rest are left alone).
  */
 int
 update_mac_settings(struct ifnet *ifp, int flags)
 {
 	int rc = 0;
 	struct vi_info *vi = ifp->if_softc;
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	int mtu = -1, promisc = -1, allmulti = -1, vlanex = -1;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 	KASSERT(flags, ("%s: not told what to update.", __func__));
 
 	if (flags & XGMAC_MTU)
 		mtu = ifp->if_mtu;
 
 	if (flags & XGMAC_PROMISC)
 		promisc = ifp->if_flags & IFF_PROMISC ? 1 : 0;
 
 	if (flags & XGMAC_ALLMULTI)
 		allmulti = ifp->if_flags & IFF_ALLMULTI ? 1 : 0;
 
 	if (flags & XGMAC_VLANEX)
 		vlanex = ifp->if_capenable & IFCAP_VLAN_HWTAGGING ? 1 : 0;
 
 	if (flags & (XGMAC_MTU|XGMAC_PROMISC|XGMAC_ALLMULTI|XGMAC_VLANEX)) {
 		rc = -t4_set_rxmode(sc, sc->mbox, vi->viid, mtu, promisc,
 		    allmulti, 1, vlanex, false);
 		if (rc) {
 			if_printf(ifp, "set_rxmode (%x) failed: %d\n", flags,
 			    rc);
 			return (rc);
 		}
 	}
 
 	if (flags & XGMAC_UCADDR) {
 		uint8_t ucaddr[ETHER_ADDR_LEN];
 
 		bcopy(IF_LLADDR(ifp), ucaddr, sizeof(ucaddr));
 		rc = t4_change_mac(sc, sc->mbox, vi->viid, vi->xact_addr_filt,
 		    ucaddr, true, true);
 		if (rc < 0) {
 			rc = -rc;
 			if_printf(ifp, "change_mac failed: %d\n", rc);
 			return (rc);
 		} else {
 			vi->xact_addr_filt = rc;
 			rc = 0;
 		}
 	}
 
 	if (flags & XGMAC_MCADDRS) {
 		const uint8_t *mcaddr[FW_MAC_EXACT_CHUNK];
 		int del = 1;
 		uint64_t hash = 0;
 		struct ifmultiaddr *ifma;
 		int i = 0, j;
 
 		if_maddr_rlock(ifp);
 		TAILQ_FOREACH(ifma, &ifp->if_multiaddrs, ifma_link) {
 			if (ifma->ifma_addr->sa_family != AF_LINK)
 				continue;
 			mcaddr[i] =
 			    LLADDR((struct sockaddr_dl *)ifma->ifma_addr);
 			MPASS(ETHER_IS_MULTICAST(mcaddr[i]));
 			i++;
 
 			if (i == FW_MAC_EXACT_CHUNK) {
 				rc = t4_alloc_mac_filt(sc, sc->mbox, vi->viid,
 				    del, i, mcaddr, NULL, &hash, 0);
 				if (rc < 0) {
 					rc = -rc;
 					for (j = 0; j < i; j++) {
 						if_printf(ifp,
 						    "failed to add mc address"
 						    " %02x:%02x:%02x:"
 						    "%02x:%02x:%02x rc=%d\n",
 						    mcaddr[j][0], mcaddr[j][1],
 						    mcaddr[j][2], mcaddr[j][3],
 						    mcaddr[j][4], mcaddr[j][5],
 						    rc);
 					}
 					goto mcfail;
 				}
 				del = 0;
 				i = 0;
 			}
 		}
 		if (i > 0) {
 			rc = t4_alloc_mac_filt(sc, sc->mbox, vi->viid, del, i,
 			    mcaddr, NULL, &hash, 0);
 			if (rc < 0) {
 				rc = -rc;
 				for (j = 0; j < i; j++) {
 					if_printf(ifp,
 					    "failed to add mc address"
 					    " %02x:%02x:%02x:"
 					    "%02x:%02x:%02x rc=%d\n",
 					    mcaddr[j][0], mcaddr[j][1],
 					    mcaddr[j][2], mcaddr[j][3],
 					    mcaddr[j][4], mcaddr[j][5],
 					    rc);
 				}
 				goto mcfail;
 			}
 		}
 
 		rc = -t4_set_addr_hash(sc, sc->mbox, vi->viid, 0, hash, 0);
 		if (rc != 0)
 			if_printf(ifp, "failed to set mc address hash: %d", rc);
 mcfail:
 		if_maddr_runlock(ifp);
 	}
 
 	return (rc);
 }
 
 /*
  * {begin|end}_synchronized_op must be called from the same thread.
  */
 int
 begin_synchronized_op(struct adapter *sc, struct vi_info *vi, int flags,
     char *wmesg)
 {
 	int rc, pri;
 
 #ifdef WITNESS
 	/* the caller thinks it's ok to sleep, but is it really? */
 	if (flags & SLEEP_OK)
 		WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL,
 		    "begin_synchronized_op");
 #endif
 
 	if (INTR_OK)
 		pri = PCATCH;
 	else
 		pri = 0;
 
 	ADAPTER_LOCK(sc);
 	for (;;) {
 
 		if (vi && IS_DOOMED(vi)) {
 			rc = ENXIO;
 			goto done;
 		}
 
 		if (!IS_BUSY(sc)) {
 			rc = 0;
 			break;
 		}
 
 		if (!(flags & SLEEP_OK)) {
 			rc = EBUSY;
 			goto done;
 		}
 
 		if (mtx_sleep(&sc->flags, &sc->sc_lock, pri, wmesg, 0)) {
 			rc = EINTR;
 			goto done;
 		}
 	}
 
 	KASSERT(!IS_BUSY(sc), ("%s: controller busy.", __func__));
 	SET_BUSY(sc);
 #ifdef INVARIANTS
 	sc->last_op = wmesg;
 	sc->last_op_thr = curthread;
 	sc->last_op_flags = flags;
 #endif
 
 done:
 	if (!(flags & HOLD_LOCK) || rc)
 		ADAPTER_UNLOCK(sc);
 
 	return (rc);
 }
 
 /*
  * Tell if_ioctl and if_init that the VI is going away.  This is
  * special variant of begin_synchronized_op and must be paired with a
  * call to end_synchronized_op.
  */
 void
 doom_vi(struct adapter *sc, struct vi_info *vi)
 {
 
 	ADAPTER_LOCK(sc);
 	SET_DOOMED(vi);
 	wakeup(&sc->flags);
 	while (IS_BUSY(sc))
 		mtx_sleep(&sc->flags, &sc->sc_lock, 0, "t4detach", 0);
 	SET_BUSY(sc);
 #ifdef INVARIANTS
 	sc->last_op = "t4detach";
 	sc->last_op_thr = curthread;
 	sc->last_op_flags = 0;
 #endif
 	ADAPTER_UNLOCK(sc);
 }
 
 /*
  * {begin|end}_synchronized_op must be called from the same thread.
  */
 void
 end_synchronized_op(struct adapter *sc, int flags)
 {
 
 	if (flags & LOCK_HELD)
 		ADAPTER_LOCK_ASSERT_OWNED(sc);
 	else
 		ADAPTER_LOCK(sc);
 
 	KASSERT(IS_BUSY(sc), ("%s: controller not busy.", __func__));
 	CLR_BUSY(sc);
 	wakeup(&sc->flags);
 	ADAPTER_UNLOCK(sc);
 }
 
 static int
 cxgbe_init_synchronized(struct vi_info *vi)
 {
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	struct ifnet *ifp = vi->ifp;
 	int rc = 0, i;
 	struct sge_txq *txq;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 
 	if (ifp->if_drv_flags & IFF_DRV_RUNNING)
 		return (0);	/* already running */
 
 	if (!(sc->flags & FULL_INIT_DONE) &&
 	    ((rc = adapter_full_init(sc)) != 0))
 		return (rc);	/* error message displayed already */
 
 	if (!(vi->flags & VI_INIT_DONE) &&
 	    ((rc = vi_full_init(vi)) != 0))
 		return (rc); /* error message displayed already */
 
 	rc = update_mac_settings(ifp, XGMAC_ALL);
 	if (rc)
 		goto done;	/* error message displayed already */
 
 	rc = -t4_enable_vi(sc, sc->mbox, vi->viid, true, true);
 	if (rc != 0) {
 		if_printf(ifp, "enable_vi failed: %d\n", rc);
 		goto done;
 	}
 
 	/*
 	 * Can't fail from this point onwards.  Review cxgbe_uninit_synchronized
 	 * if this changes.
 	 */
 
 	for_each_txq(vi, i, txq) {
 		TXQ_LOCK(txq);
 		txq->eq.flags |= EQ_ENABLED;
 		TXQ_UNLOCK(txq);
 	}
 
 	/*
 	 * The first iq of the first port to come up is used for tracing.
 	 */
 	if (sc->traceq < 0 && IS_MAIN_VI(vi)) {
 		sc->traceq = sc->sge.rxq[vi->first_rxq].iq.abs_id;
 		t4_write_reg(sc, is_t4(sc) ?  A_MPS_TRC_RSS_CONTROL :
 		    A_MPS_T5_TRC_RSS_CONTROL, V_RSSCONTROL(pi->tx_chan) |
 		    V_QUEUENUMBER(sc->traceq));
 		pi->flags |= HAS_TRACEQ;
 	}
 
 	/* all ok */
 	PORT_LOCK(pi);
 	ifp->if_drv_flags |= IFF_DRV_RUNNING;
 	pi->up_vis++;
 
 	if (pi->nvi > 1)
 		callout_reset(&vi->tick, hz, vi_tick, vi);
 	else
 		callout_reset(&pi->tick, hz, cxgbe_tick, pi);
 	PORT_UNLOCK(pi);
 done:
 	if (rc != 0)
 		cxgbe_uninit_synchronized(vi);
 
 	return (rc);
 }
 
 /*
  * Idempotent.
  */
 static int
 cxgbe_uninit_synchronized(struct vi_info *vi)
 {
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	struct ifnet *ifp = vi->ifp;
 	int rc, i;
 	struct sge_txq *txq;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 
 	if (!(vi->flags & VI_INIT_DONE)) {
 		KASSERT(!(ifp->if_drv_flags & IFF_DRV_RUNNING),
 		    ("uninited VI is running"));
 		return (0);
 	}
 
 	/*
 	 * Disable the VI so that all its data in either direction is discarded
 	 * by the MPS.  Leave everything else (the queues, interrupts, and 1Hz
 	 * tick) intact as the TP can deliver negative advice or data that it's
 	 * holding in its RAM (for an offloaded connection) even after the VI is
 	 * disabled.
 	 */
 	rc = -t4_enable_vi(sc, sc->mbox, vi->viid, false, false);
 	if (rc) {
 		if_printf(ifp, "disable_vi failed: %d\n", rc);
 		return (rc);
 	}
 
 	for_each_txq(vi, i, txq) {
 		TXQ_LOCK(txq);
 		txq->eq.flags &= ~EQ_ENABLED;
 		TXQ_UNLOCK(txq);
 	}
 
 	PORT_LOCK(pi);
 	if (pi->nvi == 1)
 		callout_stop(&pi->tick);
 	else
 		callout_stop(&vi->tick);
 	if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) {
 		PORT_UNLOCK(pi);
 		return (0);
 	}
 	ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
 	pi->up_vis--;
 	if (pi->up_vis > 0) {
 		PORT_UNLOCK(pi);
 		return (0);
 	}
 	PORT_UNLOCK(pi);
 
 	pi->link_cfg.link_ok = 0;
 	pi->link_cfg.speed = 0;
 	pi->linkdnrc = -1;
 	t4_os_link_changed(sc, pi->port_id, 0, -1);
 
 	return (0);
 }
 
 /*
  * It is ok for this function to fail midway and return right away.  t4_detach
  * will walk the entire sc->irq list and clean up whatever is valid.
  */
 static int
 setup_intr_handlers(struct adapter *sc)
 {
 	int rc, rid, p, q, v;
 	char s[8];
 	struct irq *irq;
 	struct port_info *pi;
 	struct vi_info *vi;
 	struct sge *sge = &sc->sge;
 	struct sge_rxq *rxq;
 #ifdef TCP_OFFLOAD
 	struct sge_ofld_rxq *ofld_rxq;
 #endif
 #ifdef DEV_NETMAP
 	struct sge_nm_rxq *nm_rxq;
 #endif
 #ifdef RSS
 	int nbuckets = rss_getnumbuckets();
 #endif
 
 	/*
 	 * Setup interrupts.
 	 */
 	irq = &sc->irq[0];
 	rid = sc->intr_type == INTR_INTX ? 0 : 1;
 	if (sc->intr_count == 1)
 		return (t4_alloc_irq(sc, irq, rid, t4_intr_all, sc, "all"));
 
 	/* Multiple interrupts. */
 	KASSERT(sc->intr_count >= T4_EXTRA_INTR + sc->params.nports,
 	    ("%s: too few intr.", __func__));
 
 	/* The first one is always error intr */
 	rc = t4_alloc_irq(sc, irq, rid, t4_intr_err, sc, "err");
 	if (rc != 0)
 		return (rc);
 	irq++;
 	rid++;
 
 	/* The second one is always the firmware event queue */
 	rc = t4_alloc_irq(sc, irq, rid, t4_intr_evt, &sge->fwq, "evt");
 	if (rc != 0)
 		return (rc);
 	irq++;
 	rid++;
 
 	for_each_port(sc, p) {
 		pi = sc->port[p];
 		for_each_vi(pi, v, vi) {
 			vi->first_intr = rid - 1;
 
 			if (vi->nnmrxq > 0) {
 				int n = max(vi->nrxq, vi->nnmrxq);
 
 				MPASS(vi->flags & INTR_RXQ);
 
 				rxq = &sge->rxq[vi->first_rxq];
 #ifdef DEV_NETMAP
 				nm_rxq = &sge->nm_rxq[vi->first_nm_rxq];
 #endif
 				for (q = 0; q < n; q++) {
 					snprintf(s, sizeof(s), "%x%c%x", p,
 					    'a' + v, q);
 					if (q < vi->nrxq)
 						irq->rxq = rxq++;
 #ifdef DEV_NETMAP
 					if (q < vi->nnmrxq)
 						irq->nm_rxq = nm_rxq++;
 #endif
 					rc = t4_alloc_irq(sc, irq, rid,
 					    t4_vi_intr, irq, s);
 					if (rc != 0)
 						return (rc);
 					irq++;
 					rid++;
 					vi->nintr++;
 				}
 			} else if (vi->flags & INTR_RXQ) {
 				for_each_rxq(vi, q, rxq) {
 					snprintf(s, sizeof(s), "%x%c%x", p,
 					    'a' + v, q);
 					rc = t4_alloc_irq(sc, irq, rid,
 					    t4_intr, rxq, s);
 					if (rc != 0)
 						return (rc);
 #ifdef RSS
 					bus_bind_intr(sc->dev, irq->res,
 					    rss_getcpu(q % nbuckets));
 #endif
 					irq++;
 					rid++;
 					vi->nintr++;
 				}
 			}
 #ifdef TCP_OFFLOAD
 			if (vi->flags & INTR_OFLD_RXQ) {
 				for_each_ofld_rxq(vi, q, ofld_rxq) {
 					snprintf(s, sizeof(s), "%x%c%x", p,
 					    'A' + v, q);
 					rc = t4_alloc_irq(sc, irq, rid,
 					    t4_intr, ofld_rxq, s);
 					if (rc != 0)
 						return (rc);
 					irq++;
 					rid++;
 					vi->nintr++;
 				}
 			}
 #endif
 		}
 	}
 	MPASS(irq == &sc->irq[sc->intr_count]);
 
 	return (0);
 }
 
 int
 adapter_full_init(struct adapter *sc)
 {
 	int rc, i;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 	ADAPTER_LOCK_ASSERT_NOTOWNED(sc);
 	KASSERT((sc->flags & FULL_INIT_DONE) == 0,
 	    ("%s: FULL_INIT_DONE already", __func__));
 
 	/*
 	 * queues that belong to the adapter (not any particular port).
 	 */
 	rc = t4_setup_adapter_queues(sc);
 	if (rc != 0)
 		goto done;
 
 	for (i = 0; i < nitems(sc->tq); i++) {
 		sc->tq[i] = taskqueue_create("t4 taskq", M_NOWAIT,
 		    taskqueue_thread_enqueue, &sc->tq[i]);
 		if (sc->tq[i] == NULL) {
 			device_printf(sc->dev,
 			    "failed to allocate task queue %d\n", i);
 			rc = ENOMEM;
 			goto done;
 		}
 		taskqueue_start_threads(&sc->tq[i], 1, PI_NET, "%s tq%d",
 		    device_get_nameunit(sc->dev), i);
 	}
 
 	t4_intr_enable(sc);
 	sc->flags |= FULL_INIT_DONE;
 done:
 	if (rc != 0)
 		adapter_full_uninit(sc);
 
 	return (rc);
 }
 
 int
 adapter_full_uninit(struct adapter *sc)
 {
 	int i;
 
 	ADAPTER_LOCK_ASSERT_NOTOWNED(sc);
 
 	t4_teardown_adapter_queues(sc);
 
 	for (i = 0; i < nitems(sc->tq) && sc->tq[i]; i++) {
 		taskqueue_free(sc->tq[i]);
 		sc->tq[i] = NULL;
 	}
 
 	sc->flags &= ~FULL_INIT_DONE;
 
 	return (0);
 }
 
 #ifdef RSS
 #define SUPPORTED_RSS_HASHTYPES (RSS_HASHTYPE_RSS_IPV4 | \
     RSS_HASHTYPE_RSS_TCP_IPV4 | RSS_HASHTYPE_RSS_IPV6 | \
     RSS_HASHTYPE_RSS_TCP_IPV6 | RSS_HASHTYPE_RSS_UDP_IPV4 | \
     RSS_HASHTYPE_RSS_UDP_IPV6)
 
 /* Translates kernel hash types to hardware. */
 static int
 hashconfig_to_hashen(int hashconfig)
 {
 	int hashen = 0;
 
 	if (hashconfig & RSS_HASHTYPE_RSS_IPV4)
 		hashen |= F_FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN;
 	if (hashconfig & RSS_HASHTYPE_RSS_IPV6)
 		hashen |= F_FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN;
 	if (hashconfig & RSS_HASHTYPE_RSS_UDP_IPV4) {
 		hashen |= F_FW_RSS_VI_CONFIG_CMD_UDPEN |
 		    F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN;
 	}
 	if (hashconfig & RSS_HASHTYPE_RSS_UDP_IPV6) {
 		hashen |= F_FW_RSS_VI_CONFIG_CMD_UDPEN |
 		    F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN;
 	}
 	if (hashconfig & RSS_HASHTYPE_RSS_TCP_IPV4)
 		hashen |= F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN;
 	if (hashconfig & RSS_HASHTYPE_RSS_TCP_IPV6)
 		hashen |= F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN;
 
 	return (hashen);
 }
 
 /* Translates hardware hash types to kernel. */
 static int
 hashen_to_hashconfig(int hashen)
 {
 	int hashconfig = 0;
 
 	if (hashen & F_FW_RSS_VI_CONFIG_CMD_UDPEN) {
 		/*
 		 * If UDP hashing was enabled it must have been enabled for
 		 * either IPv4 or IPv6 (inclusive or).  Enabling UDP without
 		 * enabling any 4-tuple hash is nonsense configuration.
 		 */
 		MPASS(hashen & (F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN |
 		    F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN));
 
 		if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN)
 			hashconfig |= RSS_HASHTYPE_RSS_UDP_IPV4;
 		if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN)
 			hashconfig |= RSS_HASHTYPE_RSS_UDP_IPV6;
 	}
 	if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN)
 		hashconfig |= RSS_HASHTYPE_RSS_TCP_IPV4;
 	if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN)
 		hashconfig |= RSS_HASHTYPE_RSS_TCP_IPV6;
 	if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN)
 		hashconfig |= RSS_HASHTYPE_RSS_IPV4;
 	if (hashen & F_FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN)
 		hashconfig |= RSS_HASHTYPE_RSS_IPV6;
 
 	return (hashconfig);
 }
 #endif
 
 int
 vi_full_init(struct vi_info *vi)
 {
 	struct adapter *sc = vi->pi->adapter;
 	struct ifnet *ifp = vi->ifp;
 	uint16_t *rss;
 	struct sge_rxq *rxq;
 	int rc, i, j, hashen;
 #ifdef RSS
 	int nbuckets = rss_getnumbuckets();
 	int hashconfig = rss_gethashconfig();
 	int extra;
 	uint32_t raw_rss_key[RSS_KEYSIZE / sizeof(uint32_t)];
 	uint32_t rss_key[RSS_KEYSIZE / sizeof(uint32_t)];
 #endif
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 	KASSERT((vi->flags & VI_INIT_DONE) == 0,
 	    ("%s: VI_INIT_DONE already", __func__));
 
 	sysctl_ctx_init(&vi->ctx);
 	vi->flags |= VI_SYSCTL_CTX;
 
 	/*
 	 * Allocate tx/rx/fl queues for this VI.
 	 */
 	rc = t4_setup_vi_queues(vi);
 	if (rc != 0)
 		goto done;	/* error message displayed already */
 
 	/*
 	 * Setup RSS for this VI.  Save a copy of the RSS table for later use.
 	 */
 	if (vi->nrxq > vi->rss_size) {
 		if_printf(ifp, "nrxq (%d) > hw RSS table size (%d); "
 		    "some queues will never receive traffic.\n", vi->nrxq,
 		    vi->rss_size);
 	} else if (vi->rss_size % vi->nrxq) {
 		if_printf(ifp, "nrxq (%d), hw RSS table size (%d); "
 		    "expect uneven traffic distribution.\n", vi->nrxq,
 		    vi->rss_size);
 	}
 #ifdef RSS
 	MPASS(RSS_KEYSIZE == 40);
 	if (vi->nrxq != nbuckets) {
 		if_printf(ifp, "nrxq (%d) != kernel RSS buckets (%d);"
 		    "performance will be impacted.\n", vi->nrxq, nbuckets);
 	}
 
 	rss_getkey((void *)&raw_rss_key[0]);
 	for (i = 0; i < nitems(rss_key); i++) {
 		rss_key[i] = htobe32(raw_rss_key[nitems(rss_key) - 1 - i]);
 	}
 	t4_write_rss_key(sc, &rss_key[0], -1);
 #endif
 	rss = malloc(vi->rss_size * sizeof (*rss), M_CXGBE, M_ZERO | M_WAITOK);
 	for (i = 0; i < vi->rss_size;) {
 #ifdef RSS
 		j = rss_get_indirection_to_bucket(i);
 		j %= vi->nrxq;
 		rxq = &sc->sge.rxq[vi->first_rxq + j];
 		rss[i++] = rxq->iq.abs_id;
 #else
 		for_each_rxq(vi, j, rxq) {
 			rss[i++] = rxq->iq.abs_id;
 			if (i == vi->rss_size)
 				break;
 		}
 #endif
 	}
 
 	rc = -t4_config_rss_range(sc, sc->mbox, vi->viid, 0, vi->rss_size, rss,
 	    vi->rss_size);
 	if (rc != 0) {
 		if_printf(ifp, "rss_config failed: %d\n", rc);
 		goto done;
 	}
 
 #ifdef RSS
 	hashen = hashconfig_to_hashen(hashconfig);
 
 	/*
 	 * We may have had to enable some hashes even though the global config
 	 * wants them disabled.  This is a potential problem that must be
 	 * reported to the user.
 	 */
 	extra = hashen_to_hashconfig(hashen) ^ hashconfig;
 
 	/*
 	 * If we consider only the supported hash types, then the enabled hashes
 	 * are a superset of the requested hashes.  In other words, there cannot
 	 * be any supported hash that was requested but not enabled, but there
 	 * can be hashes that were not requested but had to be enabled.
 	 */
 	extra &= SUPPORTED_RSS_HASHTYPES;
 	MPASS((extra & hashconfig) == 0);
 
 	if (extra) {
 		if_printf(ifp,
 		    "global RSS config (0x%x) cannot be accommodated.\n",
 		    hashconfig);
 	}
 	if (extra & RSS_HASHTYPE_RSS_IPV4)
 		if_printf(ifp, "IPv4 2-tuple hashing forced on.\n");
 	if (extra & RSS_HASHTYPE_RSS_TCP_IPV4)
 		if_printf(ifp, "TCP/IPv4 4-tuple hashing forced on.\n");
 	if (extra & RSS_HASHTYPE_RSS_IPV6)
 		if_printf(ifp, "IPv6 2-tuple hashing forced on.\n");
 	if (extra & RSS_HASHTYPE_RSS_TCP_IPV6)
 		if_printf(ifp, "TCP/IPv6 4-tuple hashing forced on.\n");
 	if (extra & RSS_HASHTYPE_RSS_UDP_IPV4)
 		if_printf(ifp, "UDP/IPv4 4-tuple hashing forced on.\n");
 	if (extra & RSS_HASHTYPE_RSS_UDP_IPV6)
 		if_printf(ifp, "UDP/IPv6 4-tuple hashing forced on.\n");
 #else
 	hashen = F_FW_RSS_VI_CONFIG_CMD_IP6FOURTUPEN |
 	    F_FW_RSS_VI_CONFIG_CMD_IP6TWOTUPEN |
 	    F_FW_RSS_VI_CONFIG_CMD_IP4FOURTUPEN |
 	    F_FW_RSS_VI_CONFIG_CMD_IP4TWOTUPEN | F_FW_RSS_VI_CONFIG_CMD_UDPEN;
 #endif
 	rc = -t4_config_vi_rss(sc, sc->mbox, vi->viid, hashen, rss[0]);
 	if (rc != 0) {
 		if_printf(ifp, "rss hash/defaultq config failed: %d\n", rc);
 		goto done;
 	}
 
 	vi->rss = rss;
 	vi->flags |= VI_INIT_DONE;
 done:
 	if (rc != 0)
 		vi_full_uninit(vi);
 
 	return (rc);
 }
 
 /*
  * Idempotent.
  */
 int
 vi_full_uninit(struct vi_info *vi)
 {
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 	int i;
 	struct sge_rxq *rxq;
 	struct sge_txq *txq;
 #ifdef TCP_OFFLOAD
 	struct sge_ofld_rxq *ofld_rxq;
 	struct sge_wrq *ofld_txq;
 #endif
 
 	if (vi->flags & VI_INIT_DONE) {
 
 		/* Need to quiesce queues.  */
 
 		/* XXX: Only for the first VI? */
 		if (IS_MAIN_VI(vi))
 			quiesce_wrq(sc, &sc->sge.ctrlq[pi->port_id]);
 
 		for_each_txq(vi, i, txq) {
 			quiesce_txq(sc, txq);
 		}
 
 #ifdef TCP_OFFLOAD
 		for_each_ofld_txq(vi, i, ofld_txq) {
 			quiesce_wrq(sc, ofld_txq);
 		}
 #endif
 
 		for_each_rxq(vi, i, rxq) {
 			quiesce_iq(sc, &rxq->iq);
 			quiesce_fl(sc, &rxq->fl);
 		}
 
 #ifdef TCP_OFFLOAD
 		for_each_ofld_rxq(vi, i, ofld_rxq) {
 			quiesce_iq(sc, &ofld_rxq->iq);
 			quiesce_fl(sc, &ofld_rxq->fl);
 		}
 #endif
 		free(vi->rss, M_CXGBE);
 		free(vi->nm_rss, M_CXGBE);
 	}
 
 	t4_teardown_vi_queues(vi);
 	vi->flags &= ~VI_INIT_DONE;
 
 	return (0);
 }
 
 static void
 quiesce_txq(struct adapter *sc, struct sge_txq *txq)
 {
 	struct sge_eq *eq = &txq->eq;
 	struct sge_qstat *spg = (void *)&eq->desc[eq->sidx];
 
 	(void) sc;	/* unused */
 
 #ifdef INVARIANTS
 	TXQ_LOCK(txq);
 	MPASS((eq->flags & EQ_ENABLED) == 0);
 	TXQ_UNLOCK(txq);
 #endif
 
 	/* Wait for the mp_ring to empty. */
 	while (!mp_ring_is_idle(txq->r)) {
 		mp_ring_check_drainage(txq->r, 0);
 		pause("rquiesce", 1);
 	}
 
 	/* Then wait for the hardware to finish. */
 	while (spg->cidx != htobe16(eq->pidx))
 		pause("equiesce", 1);
 
 	/* Finally, wait for the driver to reclaim all descriptors. */
 	while (eq->cidx != eq->pidx)
 		pause("dquiesce", 1);
 }
 
 static void
 quiesce_wrq(struct adapter *sc, struct sge_wrq *wrq)
 {
 
 	/* XXXTX */
 }
 
 static void
 quiesce_iq(struct adapter *sc, struct sge_iq *iq)
 {
 	(void) sc;	/* unused */
 
 	/* Synchronize with the interrupt handler */
 	while (!atomic_cmpset_int(&iq->state, IQS_IDLE, IQS_DISABLED))
 		pause("iqfree", 1);
 }
 
 static void
 quiesce_fl(struct adapter *sc, struct sge_fl *fl)
 {
 	mtx_lock(&sc->sfl_lock);
 	FL_LOCK(fl);
 	fl->flags |= FL_DOOMED;
 	FL_UNLOCK(fl);
 	callout_stop(&sc->sfl_callout);
 	mtx_unlock(&sc->sfl_lock);
 
 	KASSERT((fl->flags & FL_STARVING) == 0,
 	    ("%s: still starving", __func__));
 }
 
 static int
 t4_alloc_irq(struct adapter *sc, struct irq *irq, int rid,
     driver_intr_t *handler, void *arg, char *name)
 {
 	int rc;
 
 	irq->rid = rid;
 	irq->res = bus_alloc_resource_any(sc->dev, SYS_RES_IRQ, &irq->rid,
 	    RF_SHAREABLE | RF_ACTIVE);
 	if (irq->res == NULL) {
 		device_printf(sc->dev,
 		    "failed to allocate IRQ for rid %d, name %s.\n", rid, name);
 		return (ENOMEM);
 	}
 
 	rc = bus_setup_intr(sc->dev, irq->res, INTR_MPSAFE | INTR_TYPE_NET,
 	    NULL, handler, arg, &irq->tag);
 	if (rc != 0) {
 		device_printf(sc->dev,
 		    "failed to setup interrupt for rid %d, name %s: %d\n",
 		    rid, name, rc);
 	} else if (name)
-		bus_describe_intr(sc->dev, irq->res, irq->tag, name);
+		bus_describe_intr(sc->dev, irq->res, irq->tag, "%s", name);
 
 	return (rc);
 }
 
 static int
 t4_free_irq(struct adapter *sc, struct irq *irq)
 {
 	if (irq->tag)
 		bus_teardown_intr(sc->dev, irq->res, irq->tag);
 	if (irq->res)
 		bus_release_resource(sc->dev, SYS_RES_IRQ, irq->rid, irq->res);
 
 	bzero(irq, sizeof(*irq));
 
 	return (0);
 }
 
 static void
 get_regs(struct adapter *sc, struct t4_regdump *regs, uint8_t *buf)
 {
 
 	regs->version = chip_id(sc) | chip_rev(sc) << 10;
 	t4_get_regs(sc, buf, regs->len);
 }
 
 #define	A_PL_INDIR_CMD	0x1f8
 
 #define	S_PL_AUTOINC	31
 #define	M_PL_AUTOINC	0x1U
 #define	V_PL_AUTOINC(x)	((x) << S_PL_AUTOINC)
 #define	G_PL_AUTOINC(x)	(((x) >> S_PL_AUTOINC) & M_PL_AUTOINC)
 
 #define	S_PL_VFID	20
 #define	M_PL_VFID	0xffU
 #define	V_PL_VFID(x)	((x) << S_PL_VFID)
 #define	G_PL_VFID(x)	(((x) >> S_PL_VFID) & M_PL_VFID)
 
 #define	S_PL_ADDR	0
 #define	M_PL_ADDR	0xfffffU
 #define	V_PL_ADDR(x)	((x) << S_PL_ADDR)
 #define	G_PL_ADDR(x)	(((x) >> S_PL_ADDR) & M_PL_ADDR)
 
 #define	A_PL_INDIR_DATA	0x1fc
 
 static uint64_t
 read_vf_stat(struct adapter *sc, unsigned int viid, int reg)
 {
 	u32 stats[2];
 
 	mtx_assert(&sc->reg_lock, MA_OWNED);
 	t4_write_reg(sc, A_PL_INDIR_CMD, V_PL_AUTOINC(1) |
 	    V_PL_VFID(G_FW_VIID_VIN(viid)) | V_PL_ADDR(VF_MPS_REG(reg)));
 	stats[0] = t4_read_reg(sc, A_PL_INDIR_DATA);
 	stats[1] = t4_read_reg(sc, A_PL_INDIR_DATA);
 	return (((uint64_t)stats[1]) << 32 | stats[0]);
 }
 
 static void
 t4_get_vi_stats(struct adapter *sc, unsigned int viid,
     struct fw_vi_stats_vf *stats)
 {
 
 #define GET_STAT(name) \
 	read_vf_stat(sc, viid, A_MPS_VF_STAT_##name##_L)
 
 	stats->tx_bcast_bytes    = GET_STAT(TX_VF_BCAST_BYTES);
 	stats->tx_bcast_frames   = GET_STAT(TX_VF_BCAST_FRAMES);
 	stats->tx_mcast_bytes    = GET_STAT(TX_VF_MCAST_BYTES);
 	stats->tx_mcast_frames   = GET_STAT(TX_VF_MCAST_FRAMES);
 	stats->tx_ucast_bytes    = GET_STAT(TX_VF_UCAST_BYTES);
 	stats->tx_ucast_frames   = GET_STAT(TX_VF_UCAST_FRAMES);
 	stats->tx_drop_frames    = GET_STAT(TX_VF_DROP_FRAMES);
 	stats->tx_offload_bytes  = GET_STAT(TX_VF_OFFLOAD_BYTES);
 	stats->tx_offload_frames = GET_STAT(TX_VF_OFFLOAD_FRAMES);
 	stats->rx_bcast_bytes    = GET_STAT(RX_VF_BCAST_BYTES);
 	stats->rx_bcast_frames   = GET_STAT(RX_VF_BCAST_FRAMES);
 	stats->rx_mcast_bytes    = GET_STAT(RX_VF_MCAST_BYTES);
 	stats->rx_mcast_frames   = GET_STAT(RX_VF_MCAST_FRAMES);
 	stats->rx_ucast_bytes    = GET_STAT(RX_VF_UCAST_BYTES);
 	stats->rx_ucast_frames   = GET_STAT(RX_VF_UCAST_FRAMES);
 	stats->rx_err_frames     = GET_STAT(RX_VF_ERR_FRAMES);
 
 #undef GET_STAT
 }
 
 static void
 t4_clr_vi_stats(struct adapter *sc, unsigned int viid)
 {
 	int reg;
 
 	t4_write_reg(sc, A_PL_INDIR_CMD, V_PL_AUTOINC(1) |
 	    V_PL_VFID(G_FW_VIID_VIN(viid)) |
 	    V_PL_ADDR(VF_MPS_REG(A_MPS_VF_STAT_TX_VF_BCAST_BYTES_L)));
 	for (reg = A_MPS_VF_STAT_TX_VF_BCAST_BYTES_L;
 	     reg <= A_MPS_VF_STAT_RX_VF_ERR_FRAMES_H; reg += 4)
 		t4_write_reg(sc, A_PL_INDIR_DATA, 0);
 }
 
 static void
 vi_refresh_stats(struct adapter *sc, struct vi_info *vi)
 {
 	struct timeval tv;
 	const struct timeval interval = {0, 250000};	/* 250ms */
 
 	if (!(vi->flags & VI_INIT_DONE))
 		return;
 
 	getmicrotime(&tv);
 	timevalsub(&tv, &interval);
 	if (timevalcmp(&tv, &vi->last_refreshed, <))
 		return;
 
 	mtx_lock(&sc->reg_lock);
 	t4_get_vi_stats(sc, vi->viid, &vi->stats);
 	getmicrotime(&vi->last_refreshed);
 	mtx_unlock(&sc->reg_lock);
 }
 
 static void
 cxgbe_refresh_stats(struct adapter *sc, struct port_info *pi)
 {
 	int i;
 	u_int v, tnl_cong_drops;
 	struct timeval tv;
 	const struct timeval interval = {0, 250000};	/* 250ms */
 
 	getmicrotime(&tv);
 	timevalsub(&tv, &interval);
 	if (timevalcmp(&tv, &pi->last_refreshed, <))
 		return;
 
 	tnl_cong_drops = 0;
 	t4_get_port_stats(sc, pi->tx_chan, &pi->stats);
 	for (i = 0; i < sc->chip_params->nchan; i++) {
 		if (pi->rx_chan_map & (1 << i)) {
 			mtx_lock(&sc->reg_lock);
 			t4_read_indirect(sc, A_TP_MIB_INDEX, A_TP_MIB_DATA, &v,
 			    1, A_TP_MIB_TNL_CNG_DROP_0 + i);
 			mtx_unlock(&sc->reg_lock);
 			tnl_cong_drops += v;
 		}
 	}
 	pi->tnl_cong_drops = tnl_cong_drops;
 	getmicrotime(&pi->last_refreshed);
 }
 
 static void
 cxgbe_tick(void *arg)
 {
 	struct port_info *pi = arg;
 	struct adapter *sc = pi->adapter;
 
 	PORT_LOCK_ASSERT_OWNED(pi);
 	cxgbe_refresh_stats(sc, pi);
 
 	callout_schedule(&pi->tick, hz);
 }
 
 void
 vi_tick(void *arg)
 {
 	struct vi_info *vi = arg;
 	struct adapter *sc = vi->pi->adapter;
 
 	vi_refresh_stats(sc, vi);
 
 	callout_schedule(&vi->tick, hz);
 }
 
 static void
 cxgbe_vlan_config(void *arg, struct ifnet *ifp, uint16_t vid)
 {
 	struct ifnet *vlan;
 
 	if (arg != ifp || ifp->if_type != IFT_ETHER)
 		return;
 
 	vlan = VLAN_DEVAT(ifp, vid);
 	VLAN_SETCOOKIE(vlan, ifp);
 }
 
 /*
  * Should match fw_caps_config_<foo> enums in t4fw_interface.h
  */
 static char *caps_decoder[] = {
 	"\20\001IPMI\002NCSI",				/* 0: NBM */
 	"\20\001PPP\002QFC\003DCBX",			/* 1: link */
 	"\20\001INGRESS\002EGRESS",			/* 2: switch */
 	"\20\001NIC\002VM\003IDS\004UM\005UM_ISGL"	/* 3: NIC */
 	    "\006HASHFILTER\007ETHOFLD",
 	"\20\001TOE",					/* 4: TOE */
 	"\20\001RDDP\002RDMAC",				/* 5: RDMA */
 	"\20\001INITIATOR_PDU\002TARGET_PDU"		/* 6: iSCSI */
 	    "\003INITIATOR_CNXOFLD\004TARGET_CNXOFLD"
 	    "\005INITIATOR_SSNOFLD\006TARGET_SSNOFLD"
 	    "\007T10DIF"
 	    "\010INITIATOR_CMDOFLD\011TARGET_CMDOFLD",
 	"\20\00KEYS",					/* 7: TLS */
 	"\20\001INITIATOR\002TARGET\003CTRL_OFLD"	/* 8: FCoE */
 		    "\004PO_INITIATOR\005PO_TARGET",
 };
 
 static void
 t4_sysctls(struct adapter *sc)
 {
 	struct sysctl_ctx_list *ctx;
 	struct sysctl_oid *oid;
 	struct sysctl_oid_list *children, *c0;
 	static char *doorbells = {"\20\1UDB\2WCWR\3UDBWC\4KDB"};
 
 	ctx = device_get_sysctl_ctx(sc->dev);
 
 	/*
 	 * dev.t4nex.X.
 	 */
 	oid = device_get_sysctl_tree(sc->dev);
 	c0 = children = SYSCTL_CHILDREN(oid);
 
 	sc->sc_do_rxcopy = 1;
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "do_rx_copy", CTLFLAG_RW,
 	    &sc->sc_do_rxcopy, 1, "Do RX copy of small frames");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nports", CTLFLAG_RD, NULL,
 	    sc->params.nports, "# of ports");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "hw_revision", CTLFLAG_RD,
 	    NULL, chip_rev(sc), "chip hardware revision");
 
 	SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "tp_version",
 	    CTLFLAG_RD, sc->tp_version, 0, "TP microcode version");
 
 	if (sc->params.exprom_vers != 0) {
 		SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "exprom_version",
 		    CTLFLAG_RD, sc->exprom_version, 0, "expansion ROM version");
 	}
 
 	SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "firmware_version",
 	    CTLFLAG_RD, sc->fw_version, 0, "firmware version");
 
 	SYSCTL_ADD_STRING(ctx, children, OID_AUTO, "cf",
 	    CTLFLAG_RD, sc->cfg_file, 0, "configuration file");
 
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "cfcsum", CTLFLAG_RD, NULL,
 	    sc->cfcsum, "config file checksum");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "doorbells",
 	    CTLTYPE_STRING | CTLFLAG_RD, doorbells, sc->doorbells,
 	    sysctl_bitfield, "A", "available doorbells");
 
 #define SYSCTL_CAP(name, n, text) \
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, #name, \
 	    CTLTYPE_STRING | CTLFLAG_RD, caps_decoder[n], sc->name, \
 	    sysctl_bitfield, "A", "available " text "capabilities")
 
 	SYSCTL_CAP(nbmcaps, 0, "NBM");
 	SYSCTL_CAP(linkcaps, 1, "link");
 	SYSCTL_CAP(switchcaps, 2, "switch");
 	SYSCTL_CAP(niccaps, 3, "NIC");
 	SYSCTL_CAP(toecaps, 4, "TCP offload");
 	SYSCTL_CAP(rdmacaps, 5, "RDMA");
 	SYSCTL_CAP(iscsicaps, 6, "iSCSI");
 	SYSCTL_CAP(tlscaps, 7, "TLS");
 	SYSCTL_CAP(fcoecaps, 8, "FCoE");
 #undef SYSCTL_CAP
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "core_clock", CTLFLAG_RD, NULL,
 	    sc->params.vpd.cclk, "core clock frequency (in KHz)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_timers",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc->params.sge.timer_val,
 	    sizeof(sc->params.sge.timer_val), sysctl_int_array, "A",
 	    "interrupt holdoff timer values (us)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_pkt_counts",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc->params.sge.counter_val,
 	    sizeof(sc->params.sge.counter_val), sysctl_int_array, "A",
 	    "interrupt holdoff packet counter values");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nfilters", CTLFLAG_RD,
 	    NULL, sc->tids.nftids, "number of filters");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "temperature", CTLTYPE_INT |
 	    CTLFLAG_RD, sc, 0, sysctl_temperature, "I",
 	    "chip temperature (in Celsius)");
 
 	t4_sge_sysctls(sc, ctx, children);
 
 	sc->lro_timeout = 100;
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "lro_timeout", CTLFLAG_RW,
 	    &sc->lro_timeout, 0, "lro inactive-flush timeout (in us)");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "debug_flags", CTLFLAG_RW,
 	    &sc->debug_flags, 0, "flags to enable runtime debugging");
 
 #ifdef SBUF_DRAIN
 	/*
 	 * dev.t4nex.X.misc.  Marked CTLFLAG_SKIP to avoid information overload.
 	 */
 	oid = SYSCTL_ADD_NODE(ctx, c0, OID_AUTO, "misc",
 	    CTLFLAG_RD | CTLFLAG_SKIP, NULL,
 	    "logs and miscellaneous information");
 	children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cctrl",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_cctrl, "A", "congestion control");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_tp0",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_cim_ibq_obq, "A", "CIM IBQ 0 (TP0)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_tp1",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 1,
 	    sysctl_cim_ibq_obq, "A", "CIM IBQ 1 (TP1)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_ulp",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 2,
 	    sysctl_cim_ibq_obq, "A", "CIM IBQ 2 (ULP)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_sge0",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 3,
 	    sysctl_cim_ibq_obq, "A", "CIM IBQ 3 (SGE0)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_sge1",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 4,
 	    sysctl_cim_ibq_obq, "A", "CIM IBQ 4 (SGE1)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ibq_ncsi",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 5,
 	    sysctl_cim_ibq_obq, "A", "CIM IBQ 5 (NCSI)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_la",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    chip_id(sc) <= CHELSIO_T5 ? sysctl_cim_la : sysctl_cim_la_t6,
 	    "A", "CIM logic analyzer");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_ma_la",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_cim_ma_la, "A", "CIM MA logic analyzer");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp0",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0 + CIM_NUM_IBQ,
 	    sysctl_cim_ibq_obq, "A", "CIM OBQ 0 (ULP0)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp1",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 1 + CIM_NUM_IBQ,
 	    sysctl_cim_ibq_obq, "A", "CIM OBQ 1 (ULP1)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp2",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 2 + CIM_NUM_IBQ,
 	    sysctl_cim_ibq_obq, "A", "CIM OBQ 2 (ULP2)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ulp3",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 3 + CIM_NUM_IBQ,
 	    sysctl_cim_ibq_obq, "A", "CIM OBQ 3 (ULP3)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_sge",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 4 + CIM_NUM_IBQ,
 	    sysctl_cim_ibq_obq, "A", "CIM OBQ 4 (SGE)");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_ncsi",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 5 + CIM_NUM_IBQ,
 	    sysctl_cim_ibq_obq, "A", "CIM OBQ 5 (NCSI)");
 
 	if (chip_id(sc) > CHELSIO_T4) {
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_sge0_rx",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, 6 + CIM_NUM_IBQ,
 		    sysctl_cim_ibq_obq, "A", "CIM OBQ 6 (SGE0-RX)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_obq_sge1_rx",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, 7 + CIM_NUM_IBQ,
 		    sysctl_cim_ibq_obq, "A", "CIM OBQ 7 (SGE1-RX)");
 	}
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_pif_la",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_cim_pif_la, "A", "CIM PIF logic analyzer");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cim_qcfg",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_cim_qcfg, "A", "CIM queue configuration");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "cpl_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_cpl_stats, "A", "CPL statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "ddp_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_ddp_stats, "A", "non-TCP DDP statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "devlog",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_devlog, "A", "firmware's device log");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "fcoe_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_fcoe_stats, "A", "FCoE statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "hw_sched",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_hw_sched, "A", "hardware scheduler ");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "l2t",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_l2t, "A", "hardware L2 table");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "lb_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_lb_stats, "A", "loopback statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "meminfo",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_meminfo, "A", "memory regions");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "mps_tcam",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    chip_id(sc) <= CHELSIO_T5 ? sysctl_mps_tcam : sysctl_mps_tcam_t6,
 	    "A", "MPS TCAM entries");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "path_mtus",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_path_mtus, "A", "path MTUs");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "pm_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_pm_stats, "A", "PM statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rdma_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_rdma_stats, "A", "RDMA statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tcp_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_tcp_stats, "A", "TCP statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tids",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_tids, "A", "TID information");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tp_err_stats",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_tp_err_stats, "A", "TP error statistics");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tp_la_mask",
 	    CTLTYPE_INT | CTLFLAG_RW, sc, 0, sysctl_tp_la_mask, "I",
 	    "TP logic analyzer event capture mask");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tp_la",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_tp_la, "A", "TP logic analyzer");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "tx_rate",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_tx_rate, "A", "Tx rate");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "ulprx_la",
 	    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 	    sysctl_ulprx_la, "A", "ULPRX logic analyzer");
 
 	if (is_t5(sc)) {
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "wcwr_stats",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, 0,
 		    sysctl_wcwr_stats, "A", "write combined work requests");
 	}
 #endif
 
 #ifdef TCP_OFFLOAD
 	if (is_offload(sc)) {
 		/*
 		 * dev.t4nex.X.toe.
 		 */
 		oid = SYSCTL_ADD_NODE(ctx, c0, OID_AUTO, "toe", CTLFLAG_RD,
 		    NULL, "TOE parameters");
 		children = SYSCTL_CHILDREN(oid);
 
 		sc->tt.sndbuf = 256 * 1024;
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "sndbuf", CTLFLAG_RW,
 		    &sc->tt.sndbuf, 0, "max hardware send buffer size");
 
 		sc->tt.ddp = 0;
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "ddp", CTLFLAG_RW,
 		    &sc->tt.ddp, 0, "DDP allowed");
 
 		sc->tt.rx_coalesce = 1;
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "rx_coalesce",
 		    CTLFLAG_RW, &sc->tt.rx_coalesce, 0, "receive coalescing");
 
 		sc->tt.tx_align = 1;
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "tx_align",
 		    CTLFLAG_RW, &sc->tt.tx_align, 0, "chop and align payload");
 
 		sc->tt.tx_zcopy = 0;
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "tx_zcopy",
 		    CTLFLAG_RW, &sc->tt.tx_zcopy, 0,
 		    "Enable zero-copy aio_write(2)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "timer_tick",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, 0, sysctl_tp_tick, "A",
 		    "TP timer tick (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "timestamp_tick",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, 1, sysctl_tp_tick, "A",
 		    "TCP timestamp tick (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "dack_tick",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, 2, sysctl_tp_tick, "A",
 		    "DACK tick (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "dack_timer",
 		    CTLTYPE_UINT | CTLFLAG_RD, sc, 0, sysctl_tp_dack_timer,
 		    "IU", "DACK timer (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rexmt_min",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_RXT_MIN,
 		    sysctl_tp_timer, "LU", "Retransmit min (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rexmt_max",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_RXT_MAX,
 		    sysctl_tp_timer, "LU", "Retransmit max (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "persist_min",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_PERS_MIN,
 		    sysctl_tp_timer, "LU", "Persist timer min (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "persist_max",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_PERS_MAX,
 		    sysctl_tp_timer, "LU", "Persist timer max (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "keepalive_idle",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_KEEP_IDLE,
 		    sysctl_tp_timer, "LU", "Keepidle idle timer (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "keepalive_intvl",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_KEEP_INTVL,
 		    sysctl_tp_timer, "LU", "Keepidle interval (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "initial_srtt",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_INIT_SRTT,
 		    sysctl_tp_timer, "LU", "Initial SRTT (us)");
 
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "finwait2_timer",
 		    CTLTYPE_ULONG | CTLFLAG_RD, sc, A_TP_FINWAIT2_TIMER,
 		    sysctl_tp_timer, "LU", "FINWAIT2 timer (us)");
 	}
 #endif
 }
 
 void
 vi_sysctls(struct vi_info *vi)
 {
 	struct sysctl_ctx_list *ctx;
 	struct sysctl_oid *oid;
 	struct sysctl_oid_list *children;
 
 	ctx = device_get_sysctl_ctx(vi->dev);
 
 	/*
 	 * dev.v?(cxgbe|cxl).X.
 	 */
 	oid = device_get_sysctl_tree(vi->dev);
 	children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "viid", CTLFLAG_RD, NULL,
 	    vi->viid, "VI identifer");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nrxq", CTLFLAG_RD,
 	    &vi->nrxq, 0, "# of rx queues");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "ntxq", CTLFLAG_RD,
 	    &vi->ntxq, 0, "# of tx queues");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_rxq", CTLFLAG_RD,
 	    &vi->first_rxq, 0, "index of first rx queue");
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_txq", CTLFLAG_RD,
 	    &vi->first_txq, 0, "index of first tx queue");
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "rss_size", CTLFLAG_RD, NULL,
 	    vi->rss_size, "size of RSS indirection table");
 
 	if (IS_MAIN_VI(vi)) {
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "rsrv_noflowq",
 		    CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_noflowq, "IU",
 		    "Reserve queue 0 for non-flowid packets");
 	}
 
 #ifdef TCP_OFFLOAD
 	if (vi->nofldrxq != 0) {
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nofldrxq", CTLFLAG_RD,
 		    &vi->nofldrxq, 0,
 		    "# of rx queues for offloaded TCP connections");
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nofldtxq", CTLFLAG_RD,
 		    &vi->nofldtxq, 0,
 		    "# of tx queues for offloaded TCP connections");
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_ofld_rxq",
 		    CTLFLAG_RD, &vi->first_ofld_rxq, 0,
 		    "index of first TOE rx queue");
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_ofld_txq",
 		    CTLFLAG_RD, &vi->first_ofld_txq, 0,
 		    "index of first TOE tx queue");
 	}
 #endif
 #ifdef DEV_NETMAP
 	if (vi->nnmrxq != 0) {
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nnmrxq", CTLFLAG_RD,
 		    &vi->nnmrxq, 0, "# of netmap rx queues");
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "nnmtxq", CTLFLAG_RD,
 		    &vi->nnmtxq, 0, "# of netmap tx queues");
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_nm_rxq",
 		    CTLFLAG_RD, &vi->first_nm_rxq, 0,
 		    "index of first netmap rx queue");
 		SYSCTL_ADD_INT(ctx, children, OID_AUTO, "first_nm_txq",
 		    CTLFLAG_RD, &vi->first_nm_txq, 0,
 		    "index of first netmap tx queue");
 	}
 #endif
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_tmr_idx",
 	    CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_holdoff_tmr_idx, "I",
 	    "holdoff timer index");
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "holdoff_pktc_idx",
 	    CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_holdoff_pktc_idx, "I",
 	    "holdoff packet counter index");
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "qsize_rxq",
 	    CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_qsize_rxq, "I",
 	    "rx queue size");
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "qsize_txq",
 	    CTLTYPE_INT | CTLFLAG_RW, vi, 0, sysctl_qsize_txq, "I",
 	    "tx queue size");
 }
 
 static void
 cxgbe_sysctls(struct port_info *pi)
 {
 	struct sysctl_ctx_list *ctx;
 	struct sysctl_oid *oid;
 	struct sysctl_oid_list *children, *children2;
 	struct adapter *sc = pi->adapter;
 	int i;
 	char name[16];
 
 	ctx = device_get_sysctl_ctx(pi->dev);
 
 	/*
 	 * dev.cxgbe.X.
 	 */
 	oid = device_get_sysctl_tree(pi->dev);
 	children = SYSCTL_CHILDREN(oid);
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "linkdnrc", CTLTYPE_STRING |
 	   CTLFLAG_RD, pi, 0, sysctl_linkdnrc, "A", "reason why link is down");
 	if (pi->port_type == FW_PORT_TYPE_BT_XAUI) {
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "temperature",
 		    CTLTYPE_INT | CTLFLAG_RD, pi, 0, sysctl_btphy, "I",
 		    "PHY temperature (in Celsius)");
 		SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "fw_version",
 		    CTLTYPE_INT | CTLFLAG_RD, pi, 1, sysctl_btphy, "I",
 		    "PHY firmware version");
 	}
 
 	SYSCTL_ADD_PROC(ctx, children, OID_AUTO, "pause_settings",
 	    CTLTYPE_STRING | CTLFLAG_RW, pi, PAUSE_TX, sysctl_pause_settings,
 	    "A", "PAUSE settings (bit 0 = rx_pause, bit 1 = tx_pause)");
 
 	SYSCTL_ADD_INT(ctx, children, OID_AUTO, "max_speed", CTLFLAG_RD, NULL,
 	    port_top_speed(pi), "max speed (in Gbps)");
 
 	/*
 	 * dev.(cxgbe|cxl).X.tc.
 	 */
 	oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "tc", CTLFLAG_RD, NULL,
 	    "Tx scheduler traffic classes");
 	for (i = 0; i < sc->chip_params->nsched_cls; i++) {
 		struct tx_sched_class *tc = &pi->tc[i];
 
 		snprintf(name, sizeof(name), "%d", i);
 		children2 = SYSCTL_CHILDREN(SYSCTL_ADD_NODE(ctx,
 		    SYSCTL_CHILDREN(oid), OID_AUTO, name, CTLFLAG_RD, NULL,
 		    "traffic class"));
 		SYSCTL_ADD_UINT(ctx, children2, OID_AUTO, "flags", CTLFLAG_RD,
 		    &tc->flags, 0, "flags");
 		SYSCTL_ADD_UINT(ctx, children2, OID_AUTO, "refcount",
 		    CTLFLAG_RD, &tc->refcount, 0, "references to this class");
 #ifdef SBUF_DRAIN
 		SYSCTL_ADD_PROC(ctx, children2, OID_AUTO, "params",
 		    CTLTYPE_STRING | CTLFLAG_RD, sc, (pi->port_id << 16) | i,
 		    sysctl_tc_params, "A", "traffic class parameters");
 #endif
 	}
 
 	/*
 	 * dev.cxgbe.X.stats.
 	 */
 	oid = SYSCTL_ADD_NODE(ctx, children, OID_AUTO, "stats", CTLFLAG_RD,
 	    NULL, "port statistics");
 	children = SYSCTL_CHILDREN(oid);
 	SYSCTL_ADD_UINT(ctx, children, OID_AUTO, "tx_parse_error", CTLFLAG_RD,
 	    &pi->tx_parse_error, 0,
 	    "# of tx packets with invalid length or # of segments");
 
 #define SYSCTL_ADD_T4_REG64(pi, name, desc, reg) \
 	SYSCTL_ADD_OID(ctx, children, OID_AUTO, name, \
 	    CTLTYPE_U64 | CTLFLAG_RD, sc, reg, \
 	    sysctl_handle_t4_reg64, "QU", desc)
 
 	SYSCTL_ADD_T4_REG64(pi, "tx_octets", "# of octets in good frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_BYTES_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames", "total # of good frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_FRAMES_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_bcast_frames", "# of broadcast frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_BCAST_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_mcast_frames", "# of multicast frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_MCAST_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ucast_frames", "# of unicast frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_UCAST_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_error_frames", "# of error frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_ERROR_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_64",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_64B_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_65_127",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_65B_127B_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_128_255",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_128B_255B_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_256_511",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_256B_511B_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_512_1023",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_512B_1023B_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_1024_1518",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_1024B_1518B_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_frames_1519_max",
 	    "# of tx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_1519B_MAX_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_drop", "# of dropped tx frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_DROP_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_pause", "# of pause frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PAUSE_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp0", "# of PPP prio 0 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP0_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp1", "# of PPP prio 1 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP1_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp2", "# of PPP prio 2 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP2_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp3", "# of PPP prio 3 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP3_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp4", "# of PPP prio 4 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP4_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp5", "# of PPP prio 5 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP5_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp6", "# of PPP prio 6 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP6_L));
 	SYSCTL_ADD_T4_REG64(pi, "tx_ppp7", "# of PPP prio 7 frames transmitted",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_TX_PORT_PPP7_L));
 
 	SYSCTL_ADD_T4_REG64(pi, "rx_octets", "# of octets in good frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_BYTES_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames", "total # of good frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_FRAMES_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_bcast_frames", "# of broadcast frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_BCAST_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_mcast_frames", "# of multicast frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_MCAST_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ucast_frames", "# of unicast frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_UCAST_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_too_long", "# of frames exceeding MTU",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_MTU_ERROR_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_jabber", "# of jabber frames",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_MTU_CRC_ERROR_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_fcs_err",
 	    "# of frames received with bad FCS",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_CRC_ERROR_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_len_err",
 	    "# of frames received with length error",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_LEN_ERROR_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_symbol_err", "symbol errors",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_SYM_ERROR_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_runt", "# of short frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_LESS_64B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_64",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_64B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_65_127",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_65B_127B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_128_255",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_128B_255B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_256_511",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_256B_511B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_512_1023",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_512B_1023B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_1024_1518",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_1024B_1518B_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_frames_1519_max",
 	    "# of rx frames in this range",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_1519B_MAX_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_pause", "# of pause frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PAUSE_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp0", "# of PPP prio 0 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP0_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp1", "# of PPP prio 1 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP1_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp2", "# of PPP prio 2 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP2_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp3", "# of PPP prio 3 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP3_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp4", "# of PPP prio 4 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP4_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp5", "# of PPP prio 5 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP5_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp6", "# of PPP prio 6 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP6_L));
 	SYSCTL_ADD_T4_REG64(pi, "rx_ppp7", "# of PPP prio 7 frames received",
 	    PORT_REG(pi->tx_chan, A_MPS_PORT_STAT_RX_PORT_PPP7_L));
 
 #undef SYSCTL_ADD_T4_REG64
 
 #define SYSCTL_ADD_T4_PORTSTAT(name, desc) \
 	SYSCTL_ADD_UQUAD(ctx, children, OID_AUTO, #name, CTLFLAG_RD, \
 	    &pi->stats.name, desc)
 
 	/* We get these from port_stats and they may be stale by up to 1s */
 	SYSCTL_ADD_T4_PORTSTAT(rx_ovflow0,
 	    "# drops due to buffer-group 0 overflows");
 	SYSCTL_ADD_T4_PORTSTAT(rx_ovflow1,
 	    "# drops due to buffer-group 1 overflows");
 	SYSCTL_ADD_T4_PORTSTAT(rx_ovflow2,
 	    "# drops due to buffer-group 2 overflows");
 	SYSCTL_ADD_T4_PORTSTAT(rx_ovflow3,
 	    "# drops due to buffer-group 3 overflows");
 	SYSCTL_ADD_T4_PORTSTAT(rx_trunc0,
 	    "# of buffer-group 0 truncated packets");
 	SYSCTL_ADD_T4_PORTSTAT(rx_trunc1,
 	    "# of buffer-group 1 truncated packets");
 	SYSCTL_ADD_T4_PORTSTAT(rx_trunc2,
 	    "# of buffer-group 2 truncated packets");
 	SYSCTL_ADD_T4_PORTSTAT(rx_trunc3,
 	    "# of buffer-group 3 truncated packets");
 
 #undef SYSCTL_ADD_T4_PORTSTAT
 }
 
 static int
 sysctl_int_array(SYSCTL_HANDLER_ARGS)
 {
 	int rc, *i, space = 0;
 	struct sbuf sb;
 
 	sbuf_new_for_sysctl(&sb, NULL, 64, req);
 	for (i = arg1; arg2; arg2 -= sizeof(int), i++) {
 		if (space)
 			sbuf_printf(&sb, " ");
 		sbuf_printf(&sb, "%d", *i);
 		space = 1;
 	}
 	rc = sbuf_finish(&sb);
 	sbuf_delete(&sb);
 	return (rc);
 }
 
 static int
 sysctl_bitfield(SYSCTL_HANDLER_ARGS)
 {
 	int rc;
 	struct sbuf *sb;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return(rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 128, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	sbuf_printf(sb, "%b", (int)arg2, (char *)arg1);
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_btphy(SYSCTL_HANDLER_ARGS)
 {
 	struct port_info *pi = arg1;
 	int op = arg2;
 	struct adapter *sc = pi->adapter;
 	u_int v;
 	int rc;
 
 	rc = begin_synchronized_op(sc, &pi->vi[0], SLEEP_OK | INTR_OK, "t4btt");
 	if (rc)
 		return (rc);
 	/* XXX: magic numbers */
 	rc = -t4_mdio_rd(sc, sc->mbox, pi->mdio_addr, 0x1e, op ? 0x20 : 0xc820,
 	    &v);
 	end_synchronized_op(sc, 0);
 	if (rc)
 		return (rc);
 	if (op == 0)
 		v /= 256;
 
 	rc = sysctl_handle_int(oidp, &v, 0, req);
 	return (rc);
 }
 
 static int
 sysctl_noflowq(SYSCTL_HANDLER_ARGS)
 {
 	struct vi_info *vi = arg1;
 	int rc, val;
 
 	val = vi->rsrv_noflowq;
 	rc = sysctl_handle_int(oidp, &val, 0, req);
 	if (rc != 0 || req->newptr == NULL)
 		return (rc);
 
 	if ((val >= 1) && (vi->ntxq > 1))
 		vi->rsrv_noflowq = 1;
 	else
 		vi->rsrv_noflowq = 0;
 
 	return (rc);
 }
 
 static int
 sysctl_holdoff_tmr_idx(SYSCTL_HANDLER_ARGS)
 {
 	struct vi_info *vi = arg1;
 	struct adapter *sc = vi->pi->adapter;
 	int idx, rc, i;
 	struct sge_rxq *rxq;
 #ifdef TCP_OFFLOAD
 	struct sge_ofld_rxq *ofld_rxq;
 #endif
 	uint8_t v;
 
 	idx = vi->tmr_idx;
 
 	rc = sysctl_handle_int(oidp, &idx, 0, req);
 	if (rc != 0 || req->newptr == NULL)
 		return (rc);
 
 	if (idx < 0 || idx >= SGE_NTIMERS)
 		return (EINVAL);
 
 	rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4tmr");
 	if (rc)
 		return (rc);
 
 	v = V_QINTR_TIMER_IDX(idx) | V_QINTR_CNT_EN(vi->pktc_idx != -1);
 	for_each_rxq(vi, i, rxq) {
 #ifdef atomic_store_rel_8
 		atomic_store_rel_8(&rxq->iq.intr_params, v);
 #else
 		rxq->iq.intr_params = v;
 #endif
 	}
 #ifdef TCP_OFFLOAD
 	for_each_ofld_rxq(vi, i, ofld_rxq) {
 #ifdef atomic_store_rel_8
 		atomic_store_rel_8(&ofld_rxq->iq.intr_params, v);
 #else
 		ofld_rxq->iq.intr_params = v;
 #endif
 	}
 #endif
 	vi->tmr_idx = idx;
 
 	end_synchronized_op(sc, LOCK_HELD);
 	return (0);
 }
 
 static int
 sysctl_holdoff_pktc_idx(SYSCTL_HANDLER_ARGS)
 {
 	struct vi_info *vi = arg1;
 	struct adapter *sc = vi->pi->adapter;
 	int idx, rc;
 
 	idx = vi->pktc_idx;
 
 	rc = sysctl_handle_int(oidp, &idx, 0, req);
 	if (rc != 0 || req->newptr == NULL)
 		return (rc);
 
 	if (idx < -1 || idx >= SGE_NCOUNTERS)
 		return (EINVAL);
 
 	rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4pktc");
 	if (rc)
 		return (rc);
 
 	if (vi->flags & VI_INIT_DONE)
 		rc = EBUSY; /* cannot be changed once the queues are created */
 	else
 		vi->pktc_idx = idx;
 
 	end_synchronized_op(sc, LOCK_HELD);
 	return (rc);
 }
 
 static int
 sysctl_qsize_rxq(SYSCTL_HANDLER_ARGS)
 {
 	struct vi_info *vi = arg1;
 	struct adapter *sc = vi->pi->adapter;
 	int qsize, rc;
 
 	qsize = vi->qsize_rxq;
 
 	rc = sysctl_handle_int(oidp, &qsize, 0, req);
 	if (rc != 0 || req->newptr == NULL)
 		return (rc);
 
 	if (qsize < 128 || (qsize & 7))
 		return (EINVAL);
 
 	rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4rxqs");
 	if (rc)
 		return (rc);
 
 	if (vi->flags & VI_INIT_DONE)
 		rc = EBUSY; /* cannot be changed once the queues are created */
 	else
 		vi->qsize_rxq = qsize;
 
 	end_synchronized_op(sc, LOCK_HELD);
 	return (rc);
 }
 
 static int
 sysctl_qsize_txq(SYSCTL_HANDLER_ARGS)
 {
 	struct vi_info *vi = arg1;
 	struct adapter *sc = vi->pi->adapter;
 	int qsize, rc;
 
 	qsize = vi->qsize_txq;
 
 	rc = sysctl_handle_int(oidp, &qsize, 0, req);
 	if (rc != 0 || req->newptr == NULL)
 		return (rc);
 
 	if (qsize < 128 || qsize > 65536)
 		return (EINVAL);
 
 	rc = begin_synchronized_op(sc, vi, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4txqs");
 	if (rc)
 		return (rc);
 
 	if (vi->flags & VI_INIT_DONE)
 		rc = EBUSY; /* cannot be changed once the queues are created */
 	else
 		vi->qsize_txq = qsize;
 
 	end_synchronized_op(sc, LOCK_HELD);
 	return (rc);
 }
 
 static int
 sysctl_pause_settings(SYSCTL_HANDLER_ARGS)
 {
 	struct port_info *pi = arg1;
 	struct adapter *sc = pi->adapter;
 	struct link_config *lc = &pi->link_cfg;
 	int rc;
 
 	if (req->newptr == NULL) {
 		struct sbuf *sb;
 		static char *bits = "\20\1PAUSE_RX\2PAUSE_TX";
 
 		rc = sysctl_wire_old_buffer(req, 0);
 		if (rc != 0)
 			return(rc);
 
 		sb = sbuf_new_for_sysctl(NULL, NULL, 128, req);
 		if (sb == NULL)
 			return (ENOMEM);
 
 		sbuf_printf(sb, "%b", lc->fc & (PAUSE_TX | PAUSE_RX), bits);
 		rc = sbuf_finish(sb);
 		sbuf_delete(sb);
 	} else {
 		char s[2];
 		int n;
 
 		s[0] = '0' + (lc->requested_fc & (PAUSE_TX | PAUSE_RX));
 		s[1] = 0;
 
 		rc = sysctl_handle_string(oidp, s, sizeof(s), req);
 		if (rc != 0)
 			return(rc);
 
 		if (s[1] != 0)
 			return (EINVAL);
 		if (s[0] < '0' || s[0] > '9')
 			return (EINVAL);	/* not a number */
 		n = s[0] - '0';
 		if (n & ~(PAUSE_TX | PAUSE_RX))
 			return (EINVAL);	/* some other bit is set too */
 
 		rc = begin_synchronized_op(sc, &pi->vi[0], SLEEP_OK | INTR_OK,
 		    "t4PAUSE");
 		if (rc)
 			return (rc);
 		if ((lc->requested_fc & (PAUSE_TX | PAUSE_RX)) != n) {
 			int link_ok = lc->link_ok;
 
 			lc->requested_fc &= ~(PAUSE_TX | PAUSE_RX);
 			lc->requested_fc |= n;
 			rc = -t4_link_l1cfg(sc, sc->mbox, pi->tx_chan, lc);
 			lc->link_ok = link_ok;	/* restore */
 		}
 		end_synchronized_op(sc, 0);
 	}
 
 	return (rc);
 }
 
 static int
 sysctl_handle_t4_reg64(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	int reg = arg2;
 	uint64_t val;
 
 	val = t4_read_reg64(sc, reg);
 
 	return (sysctl_handle_64(oidp, &val, 0, req));
 }
 
 static int
 sysctl_temperature(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	int rc, t;
 	uint32_t param, val;
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4temp");
 	if (rc)
 		return (rc);
 	param = V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DEV) |
 	    V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DEV_DIAG) |
 	    V_FW_PARAMS_PARAM_Y(FW_PARAM_DEV_DIAG_TMP);
 	rc = -t4_query_params(sc, sc->mbox, sc->pf, 0, 1, &param, &val);
 	end_synchronized_op(sc, 0);
 	if (rc)
 		return (rc);
 
 	/* unknown is returned as 0 but we display -1 in that case */
 	t = val == 0 ? -1 : val;
 
 	rc = sysctl_handle_int(oidp, &t, 0, req);
 	return (rc);
 }
 
 #ifdef SBUF_DRAIN
 static int
 sysctl_cctrl(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i;
 	uint16_t incr[NMTUS][NCCTRL_WIN];
 	static const char *dec_fac[] = {
 		"0.5", "0.5625", "0.625", "0.6875", "0.75", "0.8125", "0.875",
 		"0.9375"
 	};
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	t4_read_cong_tbl(sc, incr);
 
 	for (i = 0; i < NCCTRL_WIN; ++i) {
 		sbuf_printf(sb, "%2d: %4u %4u %4u %4u %4u %4u %4u %4u\n", i,
 		    incr[0][i], incr[1][i], incr[2][i], incr[3][i], incr[4][i],
 		    incr[5][i], incr[6][i], incr[7][i]);
 		sbuf_printf(sb, "%8u %4u %4u %4u %4u %4u %4u %4u %5u %s\n",
 		    incr[8][i], incr[9][i], incr[10][i], incr[11][i],
 		    incr[12][i], incr[13][i], incr[14][i], incr[15][i],
 		    sc->params.a_wnd[i], dec_fac[sc->params.b_wnd[i]]);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static const char *qname[CIM_NUM_IBQ + CIM_NUM_OBQ_T5] = {
 	"TP0", "TP1", "ULP", "SGE0", "SGE1", "NC-SI",	/* ibq's */
 	"ULP0", "ULP1", "ULP2", "ULP3", "SGE", "NC-SI",	/* obq's */
 	"SGE0-RX", "SGE1-RX"	/* additional obq's (T5 onwards) */
 };
 
 static int
 sysctl_cim_ibq_obq(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i, n, qid = arg2;
 	uint32_t *buf, *p;
 	char *qtype;
 	u_int cim_num_obq = sc->chip_params->cim_num_obq;
 
 	KASSERT(qid >= 0 && qid < CIM_NUM_IBQ + cim_num_obq,
 	    ("%s: bad qid %d\n", __func__, qid));
 
 	if (qid < CIM_NUM_IBQ) {
 		/* inbound queue */
 		qtype = "IBQ";
 		n = 4 * CIM_IBQ_SIZE;
 		buf = malloc(n * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK);
 		rc = t4_read_cim_ibq(sc, qid, buf, n);
 	} else {
 		/* outbound queue */
 		qtype = "OBQ";
 		qid -= CIM_NUM_IBQ;
 		n = 4 * cim_num_obq * CIM_OBQ_SIZE;
 		buf = malloc(n * sizeof(uint32_t), M_CXGBE, M_ZERO | M_WAITOK);
 		rc = t4_read_cim_obq(sc, qid, buf, n);
 	}
 
 	if (rc < 0) {
 		rc = -rc;
 		goto done;
 	}
 	n = rc * sizeof(uint32_t);	/* rc has # of words actually read */
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		goto done;
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, PAGE_SIZE, req);
 	if (sb == NULL) {
 		rc = ENOMEM;
 		goto done;
 	}
 
 	sbuf_printf(sb, "%s%d %s", qtype , qid, qname[arg2]);
 	for (i = 0, p = buf; i < n; i += 16, p += 4)
 		sbuf_printf(sb, "\n%#06x: %08x %08x %08x %08x", i, p[0], p[1],
 		    p[2], p[3]);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 done:
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_cim_la(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	u_int cfg;
 	struct sbuf *sb;
 	uint32_t *buf, *p;
 	int rc;
 
 	MPASS(chip_id(sc) <= CHELSIO_T5);
 
 	rc = -t4_cim_read(sc, A_UP_UP_DBG_LA_CFG, 1, &cfg);
 	if (rc != 0)
 		return (rc);
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	buf = malloc(sc->params.cim_la_size * sizeof(uint32_t), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	rc = -t4_cim_read_la(sc, buf, NULL);
 	if (rc != 0)
 		goto done;
 
 	sbuf_printf(sb, "Status   Data      PC%s",
 	    cfg & F_UPDBGLACAPTPCONLY ? "" :
 	    "     LS0Stat  LS0Addr             LS0Data");
 
 	for (p = buf; p <= &buf[sc->params.cim_la_size - 8]; p += 8) {
 		if (cfg & F_UPDBGLACAPTPCONLY) {
 			sbuf_printf(sb, "\n  %02x   %08x %08x", p[5] & 0xff,
 			    p[6], p[7]);
 			sbuf_printf(sb, "\n  %02x   %02x%06x %02x%06x",
 			    (p[3] >> 8) & 0xff, p[3] & 0xff, p[4] >> 8,
 			    p[4] & 0xff, p[5] >> 8);
 			sbuf_printf(sb, "\n  %02x   %x%07x %x%07x",
 			    (p[0] >> 4) & 0xff, p[0] & 0xf, p[1] >> 4,
 			    p[1] & 0xf, p[2] >> 4);
 		} else {
 			sbuf_printf(sb,
 			    "\n  %02x   %x%07x %x%07x %08x %08x "
 			    "%08x%08x%08x%08x",
 			    (p[0] >> 4) & 0xff, p[0] & 0xf, p[1] >> 4,
 			    p[1] & 0xf, p[2] >> 4, p[2] & 0xf, p[3], p[4], p[5],
 			    p[6], p[7]);
 		}
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 done:
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_cim_la_t6(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	u_int cfg;
 	struct sbuf *sb;
 	uint32_t *buf, *p;
 	int rc;
 
 	MPASS(chip_id(sc) > CHELSIO_T5);
 
 	rc = -t4_cim_read(sc, A_UP_UP_DBG_LA_CFG, 1, &cfg);
 	if (rc != 0)
 		return (rc);
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	buf = malloc(sc->params.cim_la_size * sizeof(uint32_t), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	rc = -t4_cim_read_la(sc, buf, NULL);
 	if (rc != 0)
 		goto done;
 
 	sbuf_printf(sb, "Status   Inst    Data      PC%s",
 	    cfg & F_UPDBGLACAPTPCONLY ? "" :
 	    "     LS0Stat  LS0Addr  LS0Data  LS1Stat  LS1Addr  LS1Data");
 
 	for (p = buf; p <= &buf[sc->params.cim_la_size - 10]; p += 10) {
 		if (cfg & F_UPDBGLACAPTPCONLY) {
 			sbuf_printf(sb, "\n  %02x   %08x %08x %08x",
 			    p[3] & 0xff, p[2], p[1], p[0]);
 			sbuf_printf(sb, "\n  %02x   %02x%06x %02x%06x %02x%06x",
 			    (p[6] >> 8) & 0xff, p[6] & 0xff, p[5] >> 8,
 			    p[5] & 0xff, p[4] >> 8, p[4] & 0xff, p[3] >> 8);
 			sbuf_printf(sb, "\n  %02x   %04x%04x %04x%04x %04x%04x",
 			    (p[9] >> 16) & 0xff, p[9] & 0xffff, p[8] >> 16,
 			    p[8] & 0xffff, p[7] >> 16, p[7] & 0xffff,
 			    p[6] >> 16);
 		} else {
 			sbuf_printf(sb, "\n  %02x   %04x%04x %04x%04x %04x%04x "
 			    "%08x %08x %08x %08x %08x %08x",
 			    (p[9] >> 16) & 0xff,
 			    p[9] & 0xffff, p[8] >> 16,
 			    p[8] & 0xffff, p[7] >> 16,
 			    p[7] & 0xffff, p[6] >> 16,
 			    p[2], p[1], p[0], p[5], p[4], p[3]);
 		}
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 done:
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_cim_ma_la(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	u_int i;
 	struct sbuf *sb;
 	uint32_t *buf, *p;
 	int rc;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	buf = malloc(2 * CIM_MALA_SIZE * 5 * sizeof(uint32_t), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	t4_cim_read_ma_la(sc, buf, buf + 5 * CIM_MALA_SIZE);
 	p = buf;
 
 	for (i = 0; i < CIM_MALA_SIZE; i++, p += 5) {
 		sbuf_printf(sb, "\n%02x%08x%08x%08x%08x", p[4], p[3], p[2],
 		    p[1], p[0]);
 	}
 
 	sbuf_printf(sb, "\n\nCnt ID Tag UE       Data       RDY VLD");
 	for (i = 0; i < CIM_MALA_SIZE; i++, p += 5) {
 		sbuf_printf(sb, "\n%3u %2u  %x   %u %08x%08x  %u   %u",
 		    (p[2] >> 10) & 0xff, (p[2] >> 7) & 7,
 		    (p[2] >> 3) & 0xf, (p[2] >> 2) & 1,
 		    (p[1] >> 2) | ((p[2] & 3) << 30),
 		    (p[0] >> 2) | ((p[1] & 3) << 30), (p[0] >> 1) & 1,
 		    p[0] & 1);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_cim_pif_la(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	u_int i;
 	struct sbuf *sb;
 	uint32_t *buf, *p;
 	int rc;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	buf = malloc(2 * CIM_PIFLA_SIZE * 6 * sizeof(uint32_t), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	t4_cim_read_pif_la(sc, buf, buf + 6 * CIM_PIFLA_SIZE, NULL, NULL);
 	p = buf;
 
 	sbuf_printf(sb, "Cntl ID DataBE   Addr                 Data");
 	for (i = 0; i < CIM_PIFLA_SIZE; i++, p += 6) {
 		sbuf_printf(sb, "\n %02x  %02x  %04x  %08x %08x%08x%08x%08x",
 		    (p[5] >> 22) & 0xff, (p[5] >> 16) & 0x3f, p[5] & 0xffff,
 		    p[4], p[3], p[2], p[1], p[0]);
 	}
 
 	sbuf_printf(sb, "\n\nCntl ID               Data");
 	for (i = 0; i < CIM_PIFLA_SIZE; i++, p += 6) {
 		sbuf_printf(sb, "\n %02x  %02x %08x%08x%08x%08x",
 		    (p[4] >> 6) & 0xff, p[4] & 0x3f, p[3], p[2], p[1], p[0]);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_cim_qcfg(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i;
 	uint16_t base[CIM_NUM_IBQ + CIM_NUM_OBQ_T5];
 	uint16_t size[CIM_NUM_IBQ + CIM_NUM_OBQ_T5];
 	uint16_t thres[CIM_NUM_IBQ];
 	uint32_t obq_wr[2 * CIM_NUM_OBQ_T5], *wr = obq_wr;
 	uint32_t stat[4 * (CIM_NUM_IBQ + CIM_NUM_OBQ_T5)], *p = stat;
 	u_int cim_num_obq, ibq_rdaddr, obq_rdaddr, nq;
 
 	cim_num_obq = sc->chip_params->cim_num_obq;
 	if (is_t4(sc)) {
 		ibq_rdaddr = A_UP_IBQ_0_RDADDR;
 		obq_rdaddr = A_UP_OBQ_0_REALADDR;
 	} else {
 		ibq_rdaddr = A_UP_IBQ_0_SHADOW_RDADDR;
 		obq_rdaddr = A_UP_OBQ_0_SHADOW_REALADDR;
 	}
 	nq = CIM_NUM_IBQ + cim_num_obq;
 
 	rc = -t4_cim_read(sc, ibq_rdaddr, 4 * nq, stat);
 	if (rc == 0)
 		rc = -t4_cim_read(sc, obq_rdaddr, 2 * cim_num_obq, obq_wr);
 	if (rc != 0)
 		return (rc);
 
 	t4_read_cimq_cfg(sc, base, size, thres);
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, PAGE_SIZE, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	sbuf_printf(sb, "Queue  Base  Size Thres RdPtr WrPtr  SOP  EOP Avail");
 
 	for (i = 0; i < CIM_NUM_IBQ; i++, p += 4)
 		sbuf_printf(sb, "\n%7s %5x %5u %5u %6x  %4x %4u %4u %5u",
 		    qname[i], base[i], size[i], thres[i], G_IBQRDADDR(p[0]),
 		    G_IBQWRADDR(p[1]), G_QUESOPCNT(p[3]), G_QUEEOPCNT(p[3]),
 		    G_QUEREMFLITS(p[2]) * 16);
 	for ( ; i < nq; i++, p += 4, wr += 2)
 		sbuf_printf(sb, "\n%7s %5x %5u %12x  %4x %4u %4u %5u", qname[i],
 		    base[i], size[i], G_QUERDADDR(p[0]) & 0x3fff,
 		    wr[0] - base[i], G_QUESOPCNT(p[3]), G_QUEEOPCNT(p[3]),
 		    G_QUEREMFLITS(p[2]) * 16);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_cpl_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tp_cpl_stats stats;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	mtx_lock(&sc->reg_lock);
 	t4_tp_get_cpl_stats(sc, &stats);
 	mtx_unlock(&sc->reg_lock);
 
 	if (sc->chip_params->nchan > 2) {
 		sbuf_printf(sb, "                 channel 0  channel 1"
 		    "  channel 2  channel 3");
 		sbuf_printf(sb, "\nCPL requests:   %10u %10u %10u %10u",
 		    stats.req[0], stats.req[1], stats.req[2], stats.req[3]);
 		sbuf_printf(sb, "\nCPL responses:   %10u %10u %10u %10u",
 		    stats.rsp[0], stats.rsp[1], stats.rsp[2], stats.rsp[3]);
 	} else {
 		sbuf_printf(sb, "                 channel 0  channel 1");
 		sbuf_printf(sb, "\nCPL requests:   %10u %10u",
 		    stats.req[0], stats.req[1]);
 		sbuf_printf(sb, "\nCPL responses:   %10u %10u",
 		    stats.rsp[0], stats.rsp[1]);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_ddp_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tp_usm_stats stats;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return(rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	t4_get_usm_stats(sc, &stats);
 
 	sbuf_printf(sb, "Frames: %u\n", stats.frames);
 	sbuf_printf(sb, "Octets: %ju\n", stats.octets);
 	sbuf_printf(sb, "Drops:  %u", stats.drops);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static const char * const devlog_level_strings[] = {
 	[FW_DEVLOG_LEVEL_EMERG]		= "EMERG",
 	[FW_DEVLOG_LEVEL_CRIT]		= "CRIT",
 	[FW_DEVLOG_LEVEL_ERR]		= "ERR",
 	[FW_DEVLOG_LEVEL_NOTICE]	= "NOTICE",
 	[FW_DEVLOG_LEVEL_INFO]		= "INFO",
 	[FW_DEVLOG_LEVEL_DEBUG]		= "DEBUG"
 };
 
 static const char * const devlog_facility_strings[] = {
 	[FW_DEVLOG_FACILITY_CORE]	= "CORE",
 	[FW_DEVLOG_FACILITY_CF]		= "CF",
 	[FW_DEVLOG_FACILITY_SCHED]	= "SCHED",
 	[FW_DEVLOG_FACILITY_TIMER]	= "TIMER",
 	[FW_DEVLOG_FACILITY_RES]	= "RES",
 	[FW_DEVLOG_FACILITY_HW]		= "HW",
 	[FW_DEVLOG_FACILITY_FLR]	= "FLR",
 	[FW_DEVLOG_FACILITY_DMAQ]	= "DMAQ",
 	[FW_DEVLOG_FACILITY_PHY]	= "PHY",
 	[FW_DEVLOG_FACILITY_MAC]	= "MAC",
 	[FW_DEVLOG_FACILITY_PORT]	= "PORT",
 	[FW_DEVLOG_FACILITY_VI]		= "VI",
 	[FW_DEVLOG_FACILITY_FILTER]	= "FILTER",
 	[FW_DEVLOG_FACILITY_ACL]	= "ACL",
 	[FW_DEVLOG_FACILITY_TM]		= "TM",
 	[FW_DEVLOG_FACILITY_QFC]	= "QFC",
 	[FW_DEVLOG_FACILITY_DCB]	= "DCB",
 	[FW_DEVLOG_FACILITY_ETH]	= "ETH",
 	[FW_DEVLOG_FACILITY_OFLD]	= "OFLD",
 	[FW_DEVLOG_FACILITY_RI]		= "RI",
 	[FW_DEVLOG_FACILITY_ISCSI]	= "ISCSI",
 	[FW_DEVLOG_FACILITY_FCOE]	= "FCOE",
 	[FW_DEVLOG_FACILITY_FOISCSI]	= "FOISCSI",
 	[FW_DEVLOG_FACILITY_FOFCOE]	= "FOFCOE",
 	[FW_DEVLOG_FACILITY_CHNET]	= "CHNET",
 };
 
 static int
 sysctl_devlog(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct devlog_params *dparams = &sc->params.devlog;
 	struct fw_devlog_e *buf, *e;
 	int i, j, rc, nentries, first = 0;
 	struct sbuf *sb;
 	uint64_t ftstamp = UINT64_MAX;
 
 	if (dparams->addr == 0)
 		return (ENXIO);
 
 	buf = malloc(dparams->size, M_CXGBE, M_NOWAIT);
 	if (buf == NULL)
 		return (ENOMEM);
 
 	rc = read_via_memwin(sc, 1, dparams->addr, (void *)buf, dparams->size);
 	if (rc != 0)
 		goto done;
 
 	nentries = dparams->size / sizeof(struct fw_devlog_e);
 	for (i = 0; i < nentries; i++) {
 		e = &buf[i];
 
 		if (e->timestamp == 0)
 			break;	/* end */
 
 		e->timestamp = be64toh(e->timestamp);
 		e->seqno = be32toh(e->seqno);
 		for (j = 0; j < 8; j++)
 			e->params[j] = be32toh(e->params[j]);
 
 		if (e->timestamp < ftstamp) {
 			ftstamp = e->timestamp;
 			first = i;
 		}
 	}
 
 	if (buf[first].timestamp == 0)
 		goto done;	/* nothing in the log */
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		goto done;
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL) {
 		rc = ENOMEM;
 		goto done;
 	}
 	sbuf_printf(sb, "%10s  %15s  %8s  %8s  %s\n",
 	    "Seq#", "Tstamp", "Level", "Facility", "Message");
 
 	i = first;
 	do {
 		e = &buf[i];
 		if (e->timestamp == 0)
 			break;	/* end */
 
 		sbuf_printf(sb, "%10d  %15ju  %8s  %8s  ",
 		    e->seqno, e->timestamp,
 		    (e->level < nitems(devlog_level_strings) ?
 			devlog_level_strings[e->level] : "UNKNOWN"),
 		    (e->facility < nitems(devlog_facility_strings) ?
 			devlog_facility_strings[e->facility] : "UNKNOWN"));
 		sbuf_printf(sb, e->fmt, e->params[0], e->params[1],
 		    e->params[2], e->params[3], e->params[4],
 		    e->params[5], e->params[6], e->params[7]);
 
 		if (++i == nentries)
 			i = 0;
 	} while (i != first);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 done:
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_fcoe_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tp_fcoe_stats stats[MAX_NCHAN];
 	int i, nchan = sc->chip_params->nchan;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	for (i = 0; i < nchan; i++)
 		t4_get_fcoe_stats(sc, i, &stats[i]);
 
 	if (nchan > 2) {
 		sbuf_printf(sb, "                   channel 0        channel 1"
 		    "        channel 2        channel 3");
 		sbuf_printf(sb, "\noctetsDDP:  %16ju %16ju %16ju %16ju",
 		    stats[0].octets_ddp, stats[1].octets_ddp,
 		    stats[2].octets_ddp, stats[3].octets_ddp);
 		sbuf_printf(sb, "\nframesDDP:  %16u %16u %16u %16u",
 		    stats[0].frames_ddp, stats[1].frames_ddp,
 		    stats[2].frames_ddp, stats[3].frames_ddp);
 		sbuf_printf(sb, "\nframesDrop: %16u %16u %16u %16u",
 		    stats[0].frames_drop, stats[1].frames_drop,
 		    stats[2].frames_drop, stats[3].frames_drop);
 	} else {
 		sbuf_printf(sb, "                   channel 0        channel 1");
 		sbuf_printf(sb, "\noctetsDDP:  %16ju %16ju",
 		    stats[0].octets_ddp, stats[1].octets_ddp);
 		sbuf_printf(sb, "\nframesDDP:  %16u %16u",
 		    stats[0].frames_ddp, stats[1].frames_ddp);
 		sbuf_printf(sb, "\nframesDrop: %16u %16u",
 		    stats[0].frames_drop, stats[1].frames_drop);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_hw_sched(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i;
 	unsigned int map, kbps, ipg, mode;
 	unsigned int pace_tab[NTX_SCHED];
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	map = t4_read_reg(sc, A_TP_TX_MOD_QUEUE_REQ_MAP);
 	mode = G_TIMERMODE(t4_read_reg(sc, A_TP_MOD_CONFIG));
 	t4_read_pace_tbl(sc, pace_tab);
 
 	sbuf_printf(sb, "Scheduler  Mode   Channel  Rate (Kbps)   "
 	    "Class IPG (0.1 ns)   Flow IPG (us)");
 
 	for (i = 0; i < NTX_SCHED; ++i, map >>= 2) {
 		t4_get_tx_sched(sc, i, &kbps, &ipg);
 		sbuf_printf(sb, "\n    %u      %-5s     %u     ", i,
 		    (mode & (1 << i)) ? "flow" : "class", map & 3);
 		if (kbps)
 			sbuf_printf(sb, "%9u     ", kbps);
 		else
 			sbuf_printf(sb, " disabled     ");
 
 		if (ipg)
 			sbuf_printf(sb, "%13u        ", ipg);
 		else
 			sbuf_printf(sb, "     disabled        ");
 
 		if (pace_tab[i])
 			sbuf_printf(sb, "%10u", pace_tab[i]);
 		else
 			sbuf_printf(sb, "  disabled");
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_lb_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i, j;
 	uint64_t *p0, *p1;
 	struct lb_port_stats s[2];
 	static const char *stat_name[] = {
 		"OctetsOK:", "FramesOK:", "BcastFrames:", "McastFrames:",
 		"UcastFrames:", "ErrorFrames:", "Frames64:", "Frames65To127:",
 		"Frames128To255:", "Frames256To511:", "Frames512To1023:",
 		"Frames1024To1518:", "Frames1519ToMax:", "FramesDropped:",
 		"BG0FramesDropped:", "BG1FramesDropped:", "BG2FramesDropped:",
 		"BG3FramesDropped:", "BG0FramesTrunc:", "BG1FramesTrunc:",
 		"BG2FramesTrunc:", "BG3FramesTrunc:"
 	};
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	memset(s, 0, sizeof(s));
 
 	for (i = 0; i < sc->chip_params->nchan; i += 2) {
 		t4_get_lb_stats(sc, i, &s[0]);
 		t4_get_lb_stats(sc, i + 1, &s[1]);
 
 		p0 = &s[0].octets;
 		p1 = &s[1].octets;
 		sbuf_printf(sb, "%s                       Loopback %u"
 		    "           Loopback %u", i == 0 ? "" : "\n", i, i + 1);
 
 		for (j = 0; j < nitems(stat_name); j++)
 			sbuf_printf(sb, "\n%-17s %20ju %20ju", stat_name[j],
 				   *p0++, *p1++);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_linkdnrc(SYSCTL_HANDLER_ARGS)
 {
 	int rc = 0;
 	struct port_info *pi = arg1;
 	struct sbuf *sb;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return(rc);
 	sb = sbuf_new_for_sysctl(NULL, NULL, 64, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	if (pi->linkdnrc < 0)
 		sbuf_printf(sb, "n/a");
 	else
 		sbuf_printf(sb, "%s", t4_link_down_rc_str(pi->linkdnrc));
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 struct mem_desc {
 	unsigned int base;
 	unsigned int limit;
 	unsigned int idx;
 };
 
 static int
 mem_desc_cmp(const void *a, const void *b)
 {
 	return ((const struct mem_desc *)a)->base -
 	       ((const struct mem_desc *)b)->base;
 }
 
 static void
 mem_region_show(struct sbuf *sb, const char *name, unsigned int from,
     unsigned int to)
 {
 	unsigned int size;
 
 	if (from == to)
 		return;
 
 	size = to - from + 1;
 	if (size == 0)
 		return;
 
 	/* XXX: need humanize_number(3) in libkern for a more readable 'size' */
 	sbuf_printf(sb, "%-15s %#x-%#x [%u]\n", name, from, to, size);
 }
 
 static int
 sysctl_meminfo(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i, n;
 	uint32_t lo, hi, used, alloc;
 	static const char *memory[] = {"EDC0:", "EDC1:", "MC:", "MC0:", "MC1:"};
 	static const char *region[] = {
 		"DBQ contexts:", "IMSG contexts:", "FLM cache:", "TCBs:",
 		"Pstructs:", "Timers:", "Rx FL:", "Tx FL:", "Pstruct FL:",
 		"Tx payload:", "Rx payload:", "LE hash:", "iSCSI region:",
 		"TDDP region:", "TPT region:", "STAG region:", "RQ region:",
 		"RQUDP region:", "PBL region:", "TXPBL region:",
 		"DBVFIFO region:", "ULPRX state:", "ULPTX state:",
 		"On-chip queues:"
 	};
 	struct mem_desc avail[4];
 	struct mem_desc mem[nitems(region) + 3];	/* up to 3 holes */
 	struct mem_desc *md = mem;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	for (i = 0; i < nitems(mem); i++) {
 		mem[i].limit = 0;
 		mem[i].idx = i;
 	}
 
 	/* Find and sort the populated memory ranges */
 	i = 0;
 	lo = t4_read_reg(sc, A_MA_TARGET_MEM_ENABLE);
 	if (lo & F_EDRAM0_ENABLE) {
 		hi = t4_read_reg(sc, A_MA_EDRAM0_BAR);
 		avail[i].base = G_EDRAM0_BASE(hi) << 20;
 		avail[i].limit = avail[i].base + (G_EDRAM0_SIZE(hi) << 20);
 		avail[i].idx = 0;
 		i++;
 	}
 	if (lo & F_EDRAM1_ENABLE) {
 		hi = t4_read_reg(sc, A_MA_EDRAM1_BAR);
 		avail[i].base = G_EDRAM1_BASE(hi) << 20;
 		avail[i].limit = avail[i].base + (G_EDRAM1_SIZE(hi) << 20);
 		avail[i].idx = 1;
 		i++;
 	}
 	if (lo & F_EXT_MEM_ENABLE) {
 		hi = t4_read_reg(sc, A_MA_EXT_MEMORY_BAR);
 		avail[i].base = G_EXT_MEM_BASE(hi) << 20;
 		avail[i].limit = avail[i].base +
 		    (G_EXT_MEM_SIZE(hi) << 20);
 		avail[i].idx = is_t5(sc) ? 3 : 2;	/* Call it MC0 for T5 */
 		i++;
 	}
 	if (is_t5(sc) && lo & F_EXT_MEM1_ENABLE) {
 		hi = t4_read_reg(sc, A_MA_EXT_MEMORY1_BAR);
 		avail[i].base = G_EXT_MEM1_BASE(hi) << 20;
 		avail[i].limit = avail[i].base +
 		    (G_EXT_MEM1_SIZE(hi) << 20);
 		avail[i].idx = 4;
 		i++;
 	}
 	if (!i)                                    /* no memory available */
 		return 0;
 	qsort(avail, i, sizeof(struct mem_desc), mem_desc_cmp);
 
 	(md++)->base = t4_read_reg(sc, A_SGE_DBQ_CTXT_BADDR);
 	(md++)->base = t4_read_reg(sc, A_SGE_IMSG_CTXT_BADDR);
 	(md++)->base = t4_read_reg(sc, A_SGE_FLM_CACHE_BADDR);
 	(md++)->base = t4_read_reg(sc, A_TP_CMM_TCB_BASE);
 	(md++)->base = t4_read_reg(sc, A_TP_CMM_MM_BASE);
 	(md++)->base = t4_read_reg(sc, A_TP_CMM_TIMER_BASE);
 	(md++)->base = t4_read_reg(sc, A_TP_CMM_MM_RX_FLST_BASE);
 	(md++)->base = t4_read_reg(sc, A_TP_CMM_MM_TX_FLST_BASE);
 	(md++)->base = t4_read_reg(sc, A_TP_CMM_MM_PS_FLST_BASE);
 
 	/* the next few have explicit upper bounds */
 	md->base = t4_read_reg(sc, A_TP_PMM_TX_BASE);
 	md->limit = md->base - 1 +
 		    t4_read_reg(sc, A_TP_PMM_TX_PAGE_SIZE) *
 		    G_PMTXMAXPAGE(t4_read_reg(sc, A_TP_PMM_TX_MAX_PAGE));
 	md++;
 
 	md->base = t4_read_reg(sc, A_TP_PMM_RX_BASE);
 	md->limit = md->base - 1 +
 		    t4_read_reg(sc, A_TP_PMM_RX_PAGE_SIZE) *
 		    G_PMRXMAXPAGE(t4_read_reg(sc, A_TP_PMM_RX_MAX_PAGE));
 	md++;
 
 	if (t4_read_reg(sc, A_LE_DB_CONFIG) & F_HASHEN) {
 		if (chip_id(sc) <= CHELSIO_T5)
 			md->base = t4_read_reg(sc, A_LE_DB_HASH_TID_BASE);
 		else
 			md->base = t4_read_reg(sc, A_LE_DB_HASH_TBL_BASE_ADDR);
 		md->limit = 0;
 	} else {
 		md->base = 0;
 		md->idx = nitems(region);  /* hide it */
 	}
 	md++;
 
 #define ulp_region(reg) \
 	md->base = t4_read_reg(sc, A_ULP_ ## reg ## _LLIMIT);\
 	(md++)->limit = t4_read_reg(sc, A_ULP_ ## reg ## _ULIMIT)
 
 	ulp_region(RX_ISCSI);
 	ulp_region(RX_TDDP);
 	ulp_region(TX_TPT);
 	ulp_region(RX_STAG);
 	ulp_region(RX_RQ);
 	ulp_region(RX_RQUDP);
 	ulp_region(RX_PBL);
 	ulp_region(TX_PBL);
 #undef ulp_region
 
 	md->base = 0;
 	md->idx = nitems(region);
 	if (!is_t4(sc)) {
 		uint32_t size = 0;
 		uint32_t sge_ctrl = t4_read_reg(sc, A_SGE_CONTROL2);
 		uint32_t fifo_size = t4_read_reg(sc, A_SGE_DBVFIFO_SIZE);
 
 		if (is_t5(sc)) {
 			if (sge_ctrl & F_VFIFO_ENABLE)
 				size = G_DBVFIFO_SIZE(fifo_size);
 		} else
 			size = G_T6_DBVFIFO_SIZE(fifo_size);
 
 		if (size) {
 			md->base = G_BASEADDR(t4_read_reg(sc,
 			    A_SGE_DBVFIFO_BADDR));
 			md->limit = md->base + (size << 2) - 1;
 		}
 	}
 	md++;
 
 	md->base = t4_read_reg(sc, A_ULP_RX_CTX_BASE);
 	md->limit = 0;
 	md++;
 	md->base = t4_read_reg(sc, A_ULP_TX_ERR_TABLE_BASE);
 	md->limit = 0;
 	md++;
 
 	md->base = sc->vres.ocq.start;
 	if (sc->vres.ocq.size)
 		md->limit = md->base + sc->vres.ocq.size - 1;
 	else
 		md->idx = nitems(region);  /* hide it */
 	md++;
 
 	/* add any address-space holes, there can be up to 3 */
 	for (n = 0; n < i - 1; n++)
 		if (avail[n].limit < avail[n + 1].base)
 			(md++)->base = avail[n].limit;
 	if (avail[n].limit)
 		(md++)->base = avail[n].limit;
 
 	n = md - mem;
 	qsort(mem, n, sizeof(struct mem_desc), mem_desc_cmp);
 
 	for (lo = 0; lo < i; lo++)
 		mem_region_show(sb, memory[avail[lo].idx], avail[lo].base,
 				avail[lo].limit - 1);
 
 	sbuf_printf(sb, "\n");
 	for (i = 0; i < n; i++) {
 		if (mem[i].idx >= nitems(region))
 			continue;                        /* skip holes */
 		if (!mem[i].limit)
 			mem[i].limit = i < n - 1 ? mem[i + 1].base - 1 : ~0;
 		mem_region_show(sb, region[mem[i].idx], mem[i].base,
 				mem[i].limit);
 	}
 
 	sbuf_printf(sb, "\n");
 	lo = t4_read_reg(sc, A_CIM_SDRAM_BASE_ADDR);
 	hi = t4_read_reg(sc, A_CIM_SDRAM_ADDR_SIZE) + lo - 1;
 	mem_region_show(sb, "uP RAM:", lo, hi);
 
 	lo = t4_read_reg(sc, A_CIM_EXTMEM2_BASE_ADDR);
 	hi = t4_read_reg(sc, A_CIM_EXTMEM2_ADDR_SIZE) + lo - 1;
 	mem_region_show(sb, "uP Extmem2:", lo, hi);
 
 	lo = t4_read_reg(sc, A_TP_PMM_RX_MAX_PAGE);
 	sbuf_printf(sb, "\n%u Rx pages of size %uKiB for %u channels\n",
 		   G_PMRXMAXPAGE(lo),
 		   t4_read_reg(sc, A_TP_PMM_RX_PAGE_SIZE) >> 10,
 		   (lo & F_PMRXNUMCHN) ? 2 : 1);
 
 	lo = t4_read_reg(sc, A_TP_PMM_TX_MAX_PAGE);
 	hi = t4_read_reg(sc, A_TP_PMM_TX_PAGE_SIZE);
 	sbuf_printf(sb, "%u Tx pages of size %u%ciB for %u channels\n",
 		   G_PMTXMAXPAGE(lo),
 		   hi >= (1 << 20) ? (hi >> 20) : (hi >> 10),
 		   hi >= (1 << 20) ? 'M' : 'K', 1 << G_PMTXNUMCHN(lo));
 	sbuf_printf(sb, "%u p-structs\n",
 		   t4_read_reg(sc, A_TP_CMM_MM_MAX_PSTRUCT));
 
 	for (i = 0; i < 4; i++) {
 		if (chip_id(sc) > CHELSIO_T5)
 			lo = t4_read_reg(sc, A_MPS_RX_MAC_BG_PG_CNT0 + i * 4);
 		else
 			lo = t4_read_reg(sc, A_MPS_RX_PG_RSV0 + i * 4);
 		if (is_t5(sc)) {
 			used = G_T5_USED(lo);
 			alloc = G_T5_ALLOC(lo);
 		} else {
 			used = G_USED(lo);
 			alloc = G_ALLOC(lo);
 		}
 		/* For T6 these are MAC buffer groups */
 		sbuf_printf(sb, "\nPort %d using %u pages out of %u allocated",
 		    i, used, alloc);
 	}
 	for (i = 0; i < sc->chip_params->nchan; i++) {
 		if (chip_id(sc) > CHELSIO_T5)
 			lo = t4_read_reg(sc, A_MPS_RX_LPBK_BG_PG_CNT0 + i * 4);
 		else
 			lo = t4_read_reg(sc, A_MPS_RX_PG_RSV4 + i * 4);
 		if (is_t5(sc)) {
 			used = G_T5_USED(lo);
 			alloc = G_T5_ALLOC(lo);
 		} else {
 			used = G_USED(lo);
 			alloc = G_ALLOC(lo);
 		}
 		/* For T6 these are MAC buffer groups */
 		sbuf_printf(sb,
 		    "\nLoopback %d using %u pages out of %u allocated",
 		    i, used, alloc);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static inline void
 tcamxy2valmask(uint64_t x, uint64_t y, uint8_t *addr, uint64_t *mask)
 {
 	*mask = x | y;
 	y = htobe64(y);
 	memcpy(addr, (char *)&y + 2, ETHER_ADDR_LEN);
 }
 
 static int
 sysctl_mps_tcam(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i;
 
 	MPASS(chip_id(sc) <= CHELSIO_T5);
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	sbuf_printf(sb,
 	    "Idx  Ethernet address     Mask     Vld Ports PF"
 	    "  VF              Replication             P0 P1 P2 P3  ML");
 	for (i = 0; i < sc->chip_params->mps_tcam_size; i++) {
 		uint64_t tcamx, tcamy, mask;
 		uint32_t cls_lo, cls_hi;
 		uint8_t addr[ETHER_ADDR_LEN];
 
 		tcamy = t4_read_reg64(sc, MPS_CLS_TCAM_Y_L(i));
 		tcamx = t4_read_reg64(sc, MPS_CLS_TCAM_X_L(i));
 		if (tcamx & tcamy)
 			continue;
 		tcamxy2valmask(tcamx, tcamy, addr, &mask);
 		cls_lo = t4_read_reg(sc, MPS_CLS_SRAM_L(i));
 		cls_hi = t4_read_reg(sc, MPS_CLS_SRAM_H(i));
 		sbuf_printf(sb, "\n%3u %02x:%02x:%02x:%02x:%02x:%02x %012jx"
 			   "  %c   %#x%4u%4d", i, addr[0], addr[1], addr[2],
 			   addr[3], addr[4], addr[5], (uintmax_t)mask,
 			   (cls_lo & F_SRAM_VLD) ? 'Y' : 'N',
 			   G_PORTMAP(cls_hi), G_PF(cls_lo),
 			   (cls_lo & F_VF_VALID) ? G_VF(cls_lo) : -1);
 
 		if (cls_lo & F_REPLICATE) {
 			struct fw_ldst_cmd ldst_cmd;
 
 			memset(&ldst_cmd, 0, sizeof(ldst_cmd));
 			ldst_cmd.op_to_addrspace =
 			    htobe32(V_FW_CMD_OP(FW_LDST_CMD) |
 				F_FW_CMD_REQUEST | F_FW_CMD_READ |
 				V_FW_LDST_CMD_ADDRSPACE(FW_LDST_ADDRSPC_MPS));
 			ldst_cmd.cycles_to_len16 = htobe32(FW_LEN16(ldst_cmd));
 			ldst_cmd.u.mps.rplc.fid_idx =
 			    htobe16(V_FW_LDST_CMD_FID(FW_LDST_MPS_RPLC) |
 				V_FW_LDST_CMD_IDX(i));
 
 			rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK,
 			    "t4mps");
 			if (rc)
 				break;
 			rc = -t4_wr_mbox(sc, sc->mbox, &ldst_cmd,
 			    sizeof(ldst_cmd), &ldst_cmd);
 			end_synchronized_op(sc, 0);
 
 			if (rc != 0) {
 				sbuf_printf(sb, "%36d", rc);
 				rc = 0;
 			} else {
 				sbuf_printf(sb, " %08x %08x %08x %08x",
 				    be32toh(ldst_cmd.u.mps.rplc.rplc127_96),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc95_64),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc63_32),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc31_0));
 			}
 		} else
 			sbuf_printf(sb, "%36s", "");
 
 		sbuf_printf(sb, "%4u%3u%3u%3u %#3x", G_SRAM_PRIO0(cls_lo),
 		    G_SRAM_PRIO1(cls_lo), G_SRAM_PRIO2(cls_lo),
 		    G_SRAM_PRIO3(cls_lo), (cls_lo >> S_MULTILISTEN0) & 0xf);
 	}
 
 	if (rc)
 		(void) sbuf_finish(sb);
 	else
 		rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_mps_tcam_t6(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i;
 
 	MPASS(chip_id(sc) > CHELSIO_T5);
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	sbuf_printf(sb, "Idx  Ethernet address     Mask       VNI   Mask"
 	    "   IVLAN Vld DIP_Hit   Lookup  Port Vld Ports PF  VF"
 	    "                           Replication"
 	    "                                    P0 P1 P2 P3  ML\n");
 
 	for (i = 0; i < sc->chip_params->mps_tcam_size; i++) {
 		uint8_t dip_hit, vlan_vld, lookup_type, port_num;
 		uint16_t ivlan;
 		uint64_t tcamx, tcamy, val, mask;
 		uint32_t cls_lo, cls_hi, ctl, data2, vnix, vniy;
 		uint8_t addr[ETHER_ADDR_LEN];
 
 		ctl = V_CTLREQID(1) | V_CTLCMDTYPE(0) | V_CTLXYBITSEL(0);
 		if (i < 256)
 			ctl |= V_CTLTCAMINDEX(i) | V_CTLTCAMSEL(0);
 		else
 			ctl |= V_CTLTCAMINDEX(i - 256) | V_CTLTCAMSEL(1);
 		t4_write_reg(sc, A_MPS_CLS_TCAM_DATA2_CTL, ctl);
 		val = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA1_REQ_ID1);
 		tcamy = G_DMACH(val) << 32;
 		tcamy |= t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA0_REQ_ID1);
 		data2 = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA2_REQ_ID1);
 		lookup_type = G_DATALKPTYPE(data2);
 		port_num = G_DATAPORTNUM(data2);
 		if (lookup_type && lookup_type != M_DATALKPTYPE) {
 			/* Inner header VNI */
 			vniy = ((data2 & F_DATAVIDH2) << 23) |
 				       (G_DATAVIDH1(data2) << 16) | G_VIDL(val);
 			dip_hit = data2 & F_DATADIPHIT;
 			vlan_vld = 0;
 		} else {
 			vniy = 0;
 			dip_hit = 0;
 			vlan_vld = data2 & F_DATAVIDH2;
 			ivlan = G_VIDL(val);
 		}
 
 		ctl |= V_CTLXYBITSEL(1);
 		t4_write_reg(sc, A_MPS_CLS_TCAM_DATA2_CTL, ctl);
 		val = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA1_REQ_ID1);
 		tcamx = G_DMACH(val) << 32;
 		tcamx |= t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA0_REQ_ID1);
 		data2 = t4_read_reg(sc, A_MPS_CLS_TCAM_RDATA2_REQ_ID1);
 		if (lookup_type && lookup_type != M_DATALKPTYPE) {
 			/* Inner header VNI mask */
 			vnix = ((data2 & F_DATAVIDH2) << 23) |
 			       (G_DATAVIDH1(data2) << 16) | G_VIDL(val);
 		} else
 			vnix = 0;
 
 		if (tcamx & tcamy)
 			continue;
 		tcamxy2valmask(tcamx, tcamy, addr, &mask);
 
 		cls_lo = t4_read_reg(sc, MPS_CLS_SRAM_L(i));
 		cls_hi = t4_read_reg(sc, MPS_CLS_SRAM_H(i));
 
 		if (lookup_type && lookup_type != M_DATALKPTYPE) {
 			sbuf_printf(sb, "\n%3u %02x:%02x:%02x:%02x:%02x:%02x "
 			    "%012jx %06x %06x    -    -   %3c"
 			    "      'I'  %4x   %3c   %#x%4u%4d", i, addr[0],
 			    addr[1], addr[2], addr[3], addr[4], addr[5],
 			    (uintmax_t)mask, vniy, vnix, dip_hit ? 'Y' : 'N',
 			    port_num, cls_lo & F_T6_SRAM_VLD ? 'Y' : 'N',
 			    G_PORTMAP(cls_hi), G_T6_PF(cls_lo),
 			    cls_lo & F_T6_VF_VALID ? G_T6_VF(cls_lo) : -1);
 		} else {
 			sbuf_printf(sb, "\n%3u %02x:%02x:%02x:%02x:%02x:%02x "
 			    "%012jx    -       -   ", i, addr[0], addr[1],
 			    addr[2], addr[3], addr[4], addr[5],
 			    (uintmax_t)mask);
 
 			if (vlan_vld)
 				sbuf_printf(sb, "%4u   Y     ", ivlan);
 			else
 				sbuf_printf(sb, "  -    N     ");
 
 			sbuf_printf(sb, "-      %3c  %4x   %3c   %#x%4u%4d",
 			    lookup_type ? 'I' : 'O', port_num,
 			    cls_lo & F_T6_SRAM_VLD ? 'Y' : 'N',
 			    G_PORTMAP(cls_hi), G_T6_PF(cls_lo),
 			    cls_lo & F_T6_VF_VALID ? G_T6_VF(cls_lo) : -1);
 		}
 
 
 		if (cls_lo & F_T6_REPLICATE) {
 			struct fw_ldst_cmd ldst_cmd;
 
 			memset(&ldst_cmd, 0, sizeof(ldst_cmd));
 			ldst_cmd.op_to_addrspace =
 			    htobe32(V_FW_CMD_OP(FW_LDST_CMD) |
 				F_FW_CMD_REQUEST | F_FW_CMD_READ |
 				V_FW_LDST_CMD_ADDRSPACE(FW_LDST_ADDRSPC_MPS));
 			ldst_cmd.cycles_to_len16 = htobe32(FW_LEN16(ldst_cmd));
 			ldst_cmd.u.mps.rplc.fid_idx =
 			    htobe16(V_FW_LDST_CMD_FID(FW_LDST_MPS_RPLC) |
 				V_FW_LDST_CMD_IDX(i));
 
 			rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK,
 			    "t6mps");
 			if (rc)
 				break;
 			rc = -t4_wr_mbox(sc, sc->mbox, &ldst_cmd,
 			    sizeof(ldst_cmd), &ldst_cmd);
 			end_synchronized_op(sc, 0);
 
 			if (rc != 0) {
 				sbuf_printf(sb, "%72d", rc);
 				rc = 0;
 			} else {
 				sbuf_printf(sb, " %08x %08x %08x %08x"
 				    " %08x %08x %08x %08x",
 				    be32toh(ldst_cmd.u.mps.rplc.rplc255_224),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc223_192),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc191_160),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc159_128),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc127_96),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc95_64),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc63_32),
 				    be32toh(ldst_cmd.u.mps.rplc.rplc31_0));
 			}
 		} else
 			sbuf_printf(sb, "%72s", "");
 
 		sbuf_printf(sb, "%4u%3u%3u%3u %#x",
 		    G_T6_SRAM_PRIO0(cls_lo), G_T6_SRAM_PRIO1(cls_lo),
 		    G_T6_SRAM_PRIO2(cls_lo), G_T6_SRAM_PRIO3(cls_lo),
 		    (cls_lo >> S_T6_MULTILISTEN0) & 0xf);
 	}
 
 	if (rc)
 		(void) sbuf_finish(sb);
 	else
 		rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_path_mtus(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	uint16_t mtus[NMTUS];
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	t4_read_mtu_tbl(sc, mtus, NULL);
 
 	sbuf_printf(sb, "%u %u %u %u %u %u %u %u %u %u %u %u %u %u %u %u",
 	    mtus[0], mtus[1], mtus[2], mtus[3], mtus[4], mtus[5], mtus[6],
 	    mtus[7], mtus[8], mtus[9], mtus[10], mtus[11], mtus[12], mtus[13],
 	    mtus[14], mtus[15]);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_pm_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, i;
 	uint32_t tx_cnt[MAX_PM_NSTATS], rx_cnt[MAX_PM_NSTATS];
 	uint64_t tx_cyc[MAX_PM_NSTATS], rx_cyc[MAX_PM_NSTATS];
 	static const char *tx_stats[MAX_PM_NSTATS] = {
 		"Read:", "Write bypass:", "Write mem:", "Bypass + mem:",
 		"Tx FIFO wait", NULL, "Tx latency"
 	};
 	static const char *rx_stats[MAX_PM_NSTATS] = {
 		"Read:", "Write bypass:", "Write mem:", "Flush:",
 		" Rx FIFO wait", NULL, "Rx latency"
 	};
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	t4_pmtx_get_stats(sc, tx_cnt, tx_cyc);
 	t4_pmrx_get_stats(sc, rx_cnt, rx_cyc);
 
 	sbuf_printf(sb, "                Tx pcmds             Tx bytes");
 	for (i = 0; i < 4; i++) {
 		sbuf_printf(sb, "\n%-13s %10u %20ju", tx_stats[i], tx_cnt[i],
 		    tx_cyc[i]);
 	}
 
 	sbuf_printf(sb, "\n                Rx pcmds             Rx bytes");
 	for (i = 0; i < 4; i++) {
 		sbuf_printf(sb, "\n%-13s %10u %20ju", rx_stats[i], rx_cnt[i],
 		    rx_cyc[i]);
 	}
 
 	if (chip_id(sc) > CHELSIO_T5) {
 		sbuf_printf(sb,
 		    "\n              Total wait      Total occupancy");
 		sbuf_printf(sb, "\n%-13s %10u %20ju", tx_stats[i], tx_cnt[i],
 		    tx_cyc[i]);
 		sbuf_printf(sb, "\n%-13s %10u %20ju", rx_stats[i], rx_cnt[i],
 		    rx_cyc[i]);
 
 		i += 2;
 		MPASS(i < nitems(tx_stats));
 
 		sbuf_printf(sb,
 		    "\n                   Reads           Total wait");
 		sbuf_printf(sb, "\n%-13s %10u %20ju", tx_stats[i], tx_cnt[i],
 		    tx_cyc[i]);
 		sbuf_printf(sb, "\n%-13s %10u %20ju", rx_stats[i], rx_cnt[i],
 		    rx_cyc[i]);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_rdma_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tp_rdma_stats stats;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	mtx_lock(&sc->reg_lock);
 	t4_tp_get_rdma_stats(sc, &stats);
 	mtx_unlock(&sc->reg_lock);
 
 	sbuf_printf(sb, "NoRQEModDefferals: %u\n", stats.rqe_dfr_mod);
 	sbuf_printf(sb, "NoRQEPktDefferals: %u", stats.rqe_dfr_pkt);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_tcp_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tp_tcp_stats v4, v6;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	mtx_lock(&sc->reg_lock);
 	t4_tp_get_tcp_stats(sc, &v4, &v6);
 	mtx_unlock(&sc->reg_lock);
 
 	sbuf_printf(sb,
 	    "                                IP                 IPv6\n");
 	sbuf_printf(sb, "OutRsts:      %20u %20u\n",
 	    v4.tcp_out_rsts, v6.tcp_out_rsts);
 	sbuf_printf(sb, "InSegs:       %20ju %20ju\n",
 	    v4.tcp_in_segs, v6.tcp_in_segs);
 	sbuf_printf(sb, "OutSegs:      %20ju %20ju\n",
 	    v4.tcp_out_segs, v6.tcp_out_segs);
 	sbuf_printf(sb, "RetransSegs:  %20ju %20ju",
 	    v4.tcp_retrans_segs, v6.tcp_retrans_segs);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_tids(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tid_info *t = &sc->tids;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	if (t->natids) {
 		sbuf_printf(sb, "ATID range: 0-%u, in use: %u\n", t->natids - 1,
 		    t->atids_in_use);
 	}
 
 	if (t->ntids) {
 		if (t4_read_reg(sc, A_LE_DB_CONFIG) & F_HASHEN) {
 			uint32_t b = t4_read_reg(sc, A_LE_DB_SERVER_INDEX) / 4;
 
 			if (b) {
 				sbuf_printf(sb, "TID range: 0-%u, %u-%u", b - 1,
 				    t4_read_reg(sc, A_LE_DB_TID_HASHBASE) / 4,
 				    t->ntids - 1);
 			} else {
 				sbuf_printf(sb, "TID range: %u-%u",
 				    t4_read_reg(sc, A_LE_DB_TID_HASHBASE) / 4,
 				    t->ntids - 1);
 			}
 		} else
 			sbuf_printf(sb, "TID range: 0-%u", t->ntids - 1);
 		sbuf_printf(sb, ", in use: %u\n",
 		    atomic_load_acq_int(&t->tids_in_use));
 	}
 
 	if (t->nstids) {
 		sbuf_printf(sb, "STID range: %u-%u, in use: %u\n", t->stid_base,
 		    t->stid_base + t->nstids - 1, t->stids_in_use);
 	}
 
 	if (t->nftids) {
 		sbuf_printf(sb, "FTID range: %u-%u\n", t->ftid_base,
 		    t->ftid_base + t->nftids - 1);
 	}
 
 	if (t->netids) {
 		sbuf_printf(sb, "ETID range: %u-%u\n", t->etid_base,
 		    t->etid_base + t->netids - 1);
 	}
 
 	sbuf_printf(sb, "HW TID usage: %u IP users, %u IPv6 users",
 	    t4_read_reg(sc, A_LE_DB_ACT_CNT_IPV4),
 	    t4_read_reg(sc, A_LE_DB_ACT_CNT_IPV6));
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_tp_err_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	struct tp_err_stats stats;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	mtx_lock(&sc->reg_lock);
 	t4_tp_get_err_stats(sc, &stats);
 	mtx_unlock(&sc->reg_lock);
 
 	if (sc->chip_params->nchan > 2) {
 		sbuf_printf(sb, "                 channel 0  channel 1"
 		    "  channel 2  channel 3\n");
 		sbuf_printf(sb, "macInErrs:      %10u %10u %10u %10u\n",
 		    stats.mac_in_errs[0], stats.mac_in_errs[1],
 		    stats.mac_in_errs[2], stats.mac_in_errs[3]);
 		sbuf_printf(sb, "hdrInErrs:      %10u %10u %10u %10u\n",
 		    stats.hdr_in_errs[0], stats.hdr_in_errs[1],
 		    stats.hdr_in_errs[2], stats.hdr_in_errs[3]);
 		sbuf_printf(sb, "tcpInErrs:      %10u %10u %10u %10u\n",
 		    stats.tcp_in_errs[0], stats.tcp_in_errs[1],
 		    stats.tcp_in_errs[2], stats.tcp_in_errs[3]);
 		sbuf_printf(sb, "tcp6InErrs:     %10u %10u %10u %10u\n",
 		    stats.tcp6_in_errs[0], stats.tcp6_in_errs[1],
 		    stats.tcp6_in_errs[2], stats.tcp6_in_errs[3]);
 		sbuf_printf(sb, "tnlCongDrops:   %10u %10u %10u %10u\n",
 		    stats.tnl_cong_drops[0], stats.tnl_cong_drops[1],
 		    stats.tnl_cong_drops[2], stats.tnl_cong_drops[3]);
 		sbuf_printf(sb, "tnlTxDrops:     %10u %10u %10u %10u\n",
 		    stats.tnl_tx_drops[0], stats.tnl_tx_drops[1],
 		    stats.tnl_tx_drops[2], stats.tnl_tx_drops[3]);
 		sbuf_printf(sb, "ofldVlanDrops:  %10u %10u %10u %10u\n",
 		    stats.ofld_vlan_drops[0], stats.ofld_vlan_drops[1],
 		    stats.ofld_vlan_drops[2], stats.ofld_vlan_drops[3]);
 		sbuf_printf(sb, "ofldChanDrops:  %10u %10u %10u %10u\n\n",
 		    stats.ofld_chan_drops[0], stats.ofld_chan_drops[1],
 		    stats.ofld_chan_drops[2], stats.ofld_chan_drops[3]);
 	} else {
 		sbuf_printf(sb, "                 channel 0  channel 1\n");
 		sbuf_printf(sb, "macInErrs:      %10u %10u\n",
 		    stats.mac_in_errs[0], stats.mac_in_errs[1]);
 		sbuf_printf(sb, "hdrInErrs:      %10u %10u\n",
 		    stats.hdr_in_errs[0], stats.hdr_in_errs[1]);
 		sbuf_printf(sb, "tcpInErrs:      %10u %10u\n",
 		    stats.tcp_in_errs[0], stats.tcp_in_errs[1]);
 		sbuf_printf(sb, "tcp6InErrs:     %10u %10u\n",
 		    stats.tcp6_in_errs[0], stats.tcp6_in_errs[1]);
 		sbuf_printf(sb, "tnlCongDrops:   %10u %10u\n",
 		    stats.tnl_cong_drops[0], stats.tnl_cong_drops[1]);
 		sbuf_printf(sb, "tnlTxDrops:     %10u %10u\n",
 		    stats.tnl_tx_drops[0], stats.tnl_tx_drops[1]);
 		sbuf_printf(sb, "ofldVlanDrops:  %10u %10u\n",
 		    stats.ofld_vlan_drops[0], stats.ofld_vlan_drops[1]);
 		sbuf_printf(sb, "ofldChanDrops:  %10u %10u\n\n",
 		    stats.ofld_chan_drops[0], stats.ofld_chan_drops[1]);
 	}
 
 	sbuf_printf(sb, "ofldNoNeigh:    %u\nofldCongDefer:  %u",
 	    stats.ofld_no_neigh, stats.ofld_cong_defer);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_tp_la_mask(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct tp_params *tpp = &sc->params.tp;
 	u_int mask;
 	int rc;
 
 	mask = tpp->la_mask >> 16;
 	rc = sysctl_handle_int(oidp, &mask, 0, req);
 	if (rc != 0 || req->newptr == NULL)
 		return (rc);
 	if (mask > 0xffff)
 		return (EINVAL);
 	tpp->la_mask = mask << 16;
 	t4_set_reg_field(sc, A_TP_DBG_LA_CONFIG, 0xffff0000U, tpp->la_mask);
 
 	return (0);
 }
 
 struct field_desc {
 	const char *name;
 	u_int start;
 	u_int width;
 };
 
 static void
 field_desc_show(struct sbuf *sb, uint64_t v, const struct field_desc *f)
 {
 	char buf[32];
 	int line_size = 0;
 
 	while (f->name) {
 		uint64_t mask = (1ULL << f->width) - 1;
 		int len = snprintf(buf, sizeof(buf), "%s: %ju", f->name,
 		    ((uintmax_t)v >> f->start) & mask);
 
 		if (line_size + len >= 79) {
 			line_size = 8;
 			sbuf_printf(sb, "\n        ");
 		}
 		sbuf_printf(sb, "%s ", buf);
 		line_size += len + 1;
 		f++;
 	}
 	sbuf_printf(sb, "\n");
 }
 
 static const struct field_desc tp_la0[] = {
 	{ "RcfOpCodeOut", 60, 4 },
 	{ "State", 56, 4 },
 	{ "WcfState", 52, 4 },
 	{ "RcfOpcSrcOut", 50, 2 },
 	{ "CRxError", 49, 1 },
 	{ "ERxError", 48, 1 },
 	{ "SanityFailed", 47, 1 },
 	{ "SpuriousMsg", 46, 1 },
 	{ "FlushInputMsg", 45, 1 },
 	{ "FlushInputCpl", 44, 1 },
 	{ "RssUpBit", 43, 1 },
 	{ "RssFilterHit", 42, 1 },
 	{ "Tid", 32, 10 },
 	{ "InitTcb", 31, 1 },
 	{ "LineNumber", 24, 7 },
 	{ "Emsg", 23, 1 },
 	{ "EdataOut", 22, 1 },
 	{ "Cmsg", 21, 1 },
 	{ "CdataOut", 20, 1 },
 	{ "EreadPdu", 19, 1 },
 	{ "CreadPdu", 18, 1 },
 	{ "TunnelPkt", 17, 1 },
 	{ "RcfPeerFin", 16, 1 },
 	{ "RcfReasonOut", 12, 4 },
 	{ "TxCchannel", 10, 2 },
 	{ "RcfTxChannel", 8, 2 },
 	{ "RxEchannel", 6, 2 },
 	{ "RcfRxChannel", 5, 1 },
 	{ "RcfDataOutSrdy", 4, 1 },
 	{ "RxDvld", 3, 1 },
 	{ "RxOoDvld", 2, 1 },
 	{ "RxCongestion", 1, 1 },
 	{ "TxCongestion", 0, 1 },
 	{ NULL }
 };
 
 static const struct field_desc tp_la1[] = {
 	{ "CplCmdIn", 56, 8 },
 	{ "CplCmdOut", 48, 8 },
 	{ "ESynOut", 47, 1 },
 	{ "EAckOut", 46, 1 },
 	{ "EFinOut", 45, 1 },
 	{ "ERstOut", 44, 1 },
 	{ "SynIn", 43, 1 },
 	{ "AckIn", 42, 1 },
 	{ "FinIn", 41, 1 },
 	{ "RstIn", 40, 1 },
 	{ "DataIn", 39, 1 },
 	{ "DataInVld", 38, 1 },
 	{ "PadIn", 37, 1 },
 	{ "RxBufEmpty", 36, 1 },
 	{ "RxDdp", 35, 1 },
 	{ "RxFbCongestion", 34, 1 },
 	{ "TxFbCongestion", 33, 1 },
 	{ "TxPktSumSrdy", 32, 1 },
 	{ "RcfUlpType", 28, 4 },
 	{ "Eread", 27, 1 },
 	{ "Ebypass", 26, 1 },
 	{ "Esave", 25, 1 },
 	{ "Static0", 24, 1 },
 	{ "Cread", 23, 1 },
 	{ "Cbypass", 22, 1 },
 	{ "Csave", 21, 1 },
 	{ "CPktOut", 20, 1 },
 	{ "RxPagePoolFull", 18, 2 },
 	{ "RxLpbkPkt", 17, 1 },
 	{ "TxLpbkPkt", 16, 1 },
 	{ "RxVfValid", 15, 1 },
 	{ "SynLearned", 14, 1 },
 	{ "SetDelEntry", 13, 1 },
 	{ "SetInvEntry", 12, 1 },
 	{ "CpcmdDvld", 11, 1 },
 	{ "CpcmdSave", 10, 1 },
 	{ "RxPstructsFull", 8, 2 },
 	{ "EpcmdDvld", 7, 1 },
 	{ "EpcmdFlush", 6, 1 },
 	{ "EpcmdTrimPrefix", 5, 1 },
 	{ "EpcmdTrimPostfix", 4, 1 },
 	{ "ERssIp4Pkt", 3, 1 },
 	{ "ERssIp6Pkt", 2, 1 },
 	{ "ERssTcpUdpPkt", 1, 1 },
 	{ "ERssFceFipPkt", 0, 1 },
 	{ NULL }
 };
 
 static const struct field_desc tp_la2[] = {
 	{ "CplCmdIn", 56, 8 },
 	{ "MpsVfVld", 55, 1 },
 	{ "MpsPf", 52, 3 },
 	{ "MpsVf", 44, 8 },
 	{ "SynIn", 43, 1 },
 	{ "AckIn", 42, 1 },
 	{ "FinIn", 41, 1 },
 	{ "RstIn", 40, 1 },
 	{ "DataIn", 39, 1 },
 	{ "DataInVld", 38, 1 },
 	{ "PadIn", 37, 1 },
 	{ "RxBufEmpty", 36, 1 },
 	{ "RxDdp", 35, 1 },
 	{ "RxFbCongestion", 34, 1 },
 	{ "TxFbCongestion", 33, 1 },
 	{ "TxPktSumSrdy", 32, 1 },
 	{ "RcfUlpType", 28, 4 },
 	{ "Eread", 27, 1 },
 	{ "Ebypass", 26, 1 },
 	{ "Esave", 25, 1 },
 	{ "Static0", 24, 1 },
 	{ "Cread", 23, 1 },
 	{ "Cbypass", 22, 1 },
 	{ "Csave", 21, 1 },
 	{ "CPktOut", 20, 1 },
 	{ "RxPagePoolFull", 18, 2 },
 	{ "RxLpbkPkt", 17, 1 },
 	{ "TxLpbkPkt", 16, 1 },
 	{ "RxVfValid", 15, 1 },
 	{ "SynLearned", 14, 1 },
 	{ "SetDelEntry", 13, 1 },
 	{ "SetInvEntry", 12, 1 },
 	{ "CpcmdDvld", 11, 1 },
 	{ "CpcmdSave", 10, 1 },
 	{ "RxPstructsFull", 8, 2 },
 	{ "EpcmdDvld", 7, 1 },
 	{ "EpcmdFlush", 6, 1 },
 	{ "EpcmdTrimPrefix", 5, 1 },
 	{ "EpcmdTrimPostfix", 4, 1 },
 	{ "ERssIp4Pkt", 3, 1 },
 	{ "ERssIp6Pkt", 2, 1 },
 	{ "ERssTcpUdpPkt", 1, 1 },
 	{ "ERssFceFipPkt", 0, 1 },
 	{ NULL }
 };
 
 static void
 tp_la_show(struct sbuf *sb, uint64_t *p, int idx)
 {
 
 	field_desc_show(sb, *p, tp_la0);
 }
 
 static void
 tp_la_show2(struct sbuf *sb, uint64_t *p, int idx)
 {
 
 	if (idx)
 		sbuf_printf(sb, "\n");
 	field_desc_show(sb, p[0], tp_la0);
 	if (idx < (TPLA_SIZE / 2 - 1) || p[1] != ~0ULL)
 		field_desc_show(sb, p[1], tp_la0);
 }
 
 static void
 tp_la_show3(struct sbuf *sb, uint64_t *p, int idx)
 {
 
 	if (idx)
 		sbuf_printf(sb, "\n");
 	field_desc_show(sb, p[0], tp_la0);
 	if (idx < (TPLA_SIZE / 2 - 1) || p[1] != ~0ULL)
 		field_desc_show(sb, p[1], (p[0] & (1 << 17)) ? tp_la2 : tp_la1);
 }
 
 static int
 sysctl_tp_la(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	uint64_t *buf, *p;
 	int rc;
 	u_int i, inc;
 	void (*show_func)(struct sbuf *, uint64_t *, int);
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	buf = malloc(TPLA_SIZE * sizeof(uint64_t), M_CXGBE, M_ZERO | M_WAITOK);
 
 	t4_tp_read_la(sc, buf, NULL);
 	p = buf;
 
 	switch (G_DBGLAMODE(t4_read_reg(sc, A_TP_DBG_LA_CONFIG))) {
 	case 2:
 		inc = 2;
 		show_func = tp_la_show2;
 		break;
 	case 3:
 		inc = 2;
 		show_func = tp_la_show3;
 		break;
 	default:
 		inc = 1;
 		show_func = tp_la_show;
 	}
 
 	for (i = 0; i < TPLA_SIZE / inc; i++, p += inc)
 		(*show_func)(sb, p, i);
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_tx_rate(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc;
 	u64 nrate[MAX_NCHAN], orate[MAX_NCHAN];
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 256, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	t4_get_chan_txrate(sc, nrate, orate);
 
 	if (sc->chip_params->nchan > 2) {
 		sbuf_printf(sb, "              channel 0   channel 1"
 		    "   channel 2   channel 3\n");
 		sbuf_printf(sb, "NIC B/s:     %10ju  %10ju  %10ju  %10ju\n",
 		    nrate[0], nrate[1], nrate[2], nrate[3]);
 		sbuf_printf(sb, "Offload B/s: %10ju  %10ju  %10ju  %10ju",
 		    orate[0], orate[1], orate[2], orate[3]);
 	} else {
 		sbuf_printf(sb, "              channel 0   channel 1\n");
 		sbuf_printf(sb, "NIC B/s:     %10ju  %10ju\n",
 		    nrate[0], nrate[1]);
 		sbuf_printf(sb, "Offload B/s: %10ju  %10ju",
 		    orate[0], orate[1]);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_ulprx_la(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	uint32_t *buf, *p;
 	int rc, i;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	buf = malloc(ULPRX_LA_SIZE * 8 * sizeof(uint32_t), M_CXGBE,
 	    M_ZERO | M_WAITOK);
 
 	t4_ulprx_read_la(sc, buf);
 	p = buf;
 
 	sbuf_printf(sb, "      Pcmd        Type   Message"
 	    "                Data");
 	for (i = 0; i < ULPRX_LA_SIZE; i++, p += 8) {
 		sbuf_printf(sb, "\n%08x%08x  %4x  %08x  %08x%08x%08x%08x",
 		    p[1], p[0], p[2], p[3], p[7], p[6], p[5], p[4]);
 	}
 
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 	free(buf, M_CXGBE);
 	return (rc);
 }
 
 static int
 sysctl_wcwr_stats(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct sbuf *sb;
 	int rc, v;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	v = t4_read_reg(sc, A_SGE_STAT_CFG);
 	if (G_STATSOURCE_T5(v) == 7) {
 		if (G_STATMODE(v) == 0) {
 			sbuf_printf(sb, "total %d, incomplete %d",
 			    t4_read_reg(sc, A_SGE_STAT_TOTAL),
 			    t4_read_reg(sc, A_SGE_STAT_MATCH));
 		} else if (G_STATMODE(v) == 1) {
 			sbuf_printf(sb, "total %d, data overflow %d",
 			    t4_read_reg(sc, A_SGE_STAT_TOTAL),
 			    t4_read_reg(sc, A_SGE_STAT_MATCH));
 		}
 	}
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 
 static int
 sysctl_tc_params(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	struct tx_sched_class *tc;
 	struct t4_sched_class_params p;
 	struct sbuf *sb;
 	int i, rc, port_id, flags, mbps, gbps;
 
 	rc = sysctl_wire_old_buffer(req, 0);
 	if (rc != 0)
 		return (rc);
 
 	sb = sbuf_new_for_sysctl(NULL, NULL, 4096, req);
 	if (sb == NULL)
 		return (ENOMEM);
 
 	port_id = arg2 >> 16;
 	MPASS(port_id < sc->params.nports);
 	MPASS(sc->port[port_id] != NULL);
 	i = arg2 & 0xffff;
 	MPASS(i < sc->chip_params->nsched_cls);
 	tc = &sc->port[port_id]->tc[i];
 
 	rc = begin_synchronized_op(sc, NULL, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4tc_p");
 	if (rc)
 		goto done;
 	flags = tc->flags;
 	p = tc->params;
 	end_synchronized_op(sc, LOCK_HELD);
 
 	if ((flags & TX_SC_OK) == 0) {
 		sbuf_printf(sb, "none");
 		goto done;
 	}
 
 	if (p.level == SCHED_CLASS_LEVEL_CL_WRR) {
 		sbuf_printf(sb, "cl-wrr weight %u", p.weight);
 		goto done;
 	} else if (p.level == SCHED_CLASS_LEVEL_CL_RL)
 		sbuf_printf(sb, "cl-rl");
 	else if (p.level == SCHED_CLASS_LEVEL_CH_RL)
 		sbuf_printf(sb, "ch-rl");
 	else {
 		rc = ENXIO;
 		goto done;
 	}
 
 	if (p.ratemode == SCHED_CLASS_RATEMODE_REL) {
 		/* XXX: top speed or actual link speed? */
 		gbps = port_top_speed(sc->port[port_id]);
 		sbuf_printf(sb, " %u%% of %uGbps", p.maxrate, gbps);
 	}
 	else if (p.ratemode == SCHED_CLASS_RATEMODE_ABS) {
 		switch (p.rateunit) {
 		case SCHED_CLASS_RATEUNIT_BITS:
 			mbps = p.maxrate / 1000;
 			gbps = p.maxrate / 1000000;
 			if (p.maxrate == gbps * 1000000)
 				sbuf_printf(sb, " %uGbps", gbps);
 			else if (p.maxrate == mbps * 1000)
 				sbuf_printf(sb, " %uMbps", mbps);
 			else
 				sbuf_printf(sb, " %uKbps", p.maxrate);
 			break;
 		case SCHED_CLASS_RATEUNIT_PKTS:
 			sbuf_printf(sb, " %upps", p.maxrate);
 			break;
 		default:
 			rc = ENXIO;
 			goto done;
 		}
 	}
 
 	switch (p.mode) {
 	case SCHED_CLASS_MODE_CLASS:
 		sbuf_printf(sb, " aggregate");
 		break;
 	case SCHED_CLASS_MODE_FLOW:
 		sbuf_printf(sb, " per-flow");
 		break;
 	default:
 		rc = ENXIO;
 		goto done;
 	}
 
 done:
 	if (rc == 0)
 		rc = sbuf_finish(sb);
 	sbuf_delete(sb);
 
 	return (rc);
 }
 #endif
 
 #ifdef TCP_OFFLOAD
 static void
 unit_conv(char *buf, size_t len, u_int val, u_int factor)
 {
 	u_int rem = val % factor;
 
 	if (rem == 0)
 		snprintf(buf, len, "%u", val / factor);
 	else {
 		while (rem % 10 == 0)
 			rem /= 10;
 		snprintf(buf, len, "%u.%u", val / factor, rem);
 	}
 }
 
 static int
 sysctl_tp_tick(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	char buf[16];
 	u_int res, re;
 	u_int cclk_ps = 1000000000 / sc->params.vpd.cclk;
 
 	res = t4_read_reg(sc, A_TP_TIMER_RESOLUTION);
 	switch (arg2) {
 	case 0:
 		/* timer_tick */
 		re = G_TIMERRESOLUTION(res);
 		break;
 	case 1:
 		/* TCP timestamp tick */
 		re = G_TIMESTAMPRESOLUTION(res);
 		break;
 	case 2:
 		/* DACK tick */
 		re = G_DELAYEDACKRESOLUTION(res);
 		break;
 	default:
 		return (EDOOFUS);
 	}
 
 	unit_conv(buf, sizeof(buf), (cclk_ps << re), 1000000);
 
 	return (sysctl_handle_string(oidp, buf, sizeof(buf), req));
 }
 
 static int
 sysctl_tp_dack_timer(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	u_int res, dack_re, v;
 	u_int cclk_ps = 1000000000 / sc->params.vpd.cclk;
 
 	res = t4_read_reg(sc, A_TP_TIMER_RESOLUTION);
 	dack_re = G_DELAYEDACKRESOLUTION(res);
 	v = ((cclk_ps << dack_re) / 1000000) * t4_read_reg(sc, A_TP_DACK_TIMER);
 
 	return (sysctl_handle_int(oidp, &v, 0, req));
 }
 
 static int
 sysctl_tp_timer(SYSCTL_HANDLER_ARGS)
 {
 	struct adapter *sc = arg1;
 	int reg = arg2;
 	u_int tre;
 	u_long tp_tick_us, v;
 	u_int cclk_ps = 1000000000 / sc->params.vpd.cclk;
 
 	MPASS(reg == A_TP_RXT_MIN || reg == A_TP_RXT_MAX ||
 	    reg == A_TP_PERS_MIN || reg == A_TP_PERS_MAX ||
 	    reg == A_TP_KEEP_IDLE || A_TP_KEEP_INTVL || reg == A_TP_INIT_SRTT ||
 	    reg == A_TP_FINWAIT2_TIMER);
 
 	tre = G_TIMERRESOLUTION(t4_read_reg(sc, A_TP_TIMER_RESOLUTION));
 	tp_tick_us = (cclk_ps << tre) / 1000000;
 
 	if (reg == A_TP_INIT_SRTT)
 		v = tp_tick_us * G_INITSRTT(t4_read_reg(sc, reg));
 	else
 		v = tp_tick_us * t4_read_reg(sc, reg);
 
 	return (sysctl_handle_long(oidp, &v, 0, req));
 }
 #endif
 
 static uint32_t
 fconf_iconf_to_mode(uint32_t fconf, uint32_t iconf)
 {
 	uint32_t mode;
 
 	mode = T4_FILTER_IPv4 | T4_FILTER_IPv6 | T4_FILTER_IP_SADDR |
 	    T4_FILTER_IP_DADDR | T4_FILTER_IP_SPORT | T4_FILTER_IP_DPORT;
 
 	if (fconf & F_FRAGMENTATION)
 		mode |= T4_FILTER_IP_FRAGMENT;
 
 	if (fconf & F_MPSHITTYPE)
 		mode |= T4_FILTER_MPS_HIT_TYPE;
 
 	if (fconf & F_MACMATCH)
 		mode |= T4_FILTER_MAC_IDX;
 
 	if (fconf & F_ETHERTYPE)
 		mode |= T4_FILTER_ETH_TYPE;
 
 	if (fconf & F_PROTOCOL)
 		mode |= T4_FILTER_IP_PROTO;
 
 	if (fconf & F_TOS)
 		mode |= T4_FILTER_IP_TOS;
 
 	if (fconf & F_VLAN)
 		mode |= T4_FILTER_VLAN;
 
 	if (fconf & F_VNIC_ID) {
 		mode |= T4_FILTER_VNIC;
 		if (iconf & F_VNIC)
 			mode |= T4_FILTER_IC_VNIC;
 	}
 
 	if (fconf & F_PORT)
 		mode |= T4_FILTER_PORT;
 
 	if (fconf & F_FCOE)
 		mode |= T4_FILTER_FCoE;
 
 	return (mode);
 }
 
 static uint32_t
 mode_to_fconf(uint32_t mode)
 {
 	uint32_t fconf = 0;
 
 	if (mode & T4_FILTER_IP_FRAGMENT)
 		fconf |= F_FRAGMENTATION;
 
 	if (mode & T4_FILTER_MPS_HIT_TYPE)
 		fconf |= F_MPSHITTYPE;
 
 	if (mode & T4_FILTER_MAC_IDX)
 		fconf |= F_MACMATCH;
 
 	if (mode & T4_FILTER_ETH_TYPE)
 		fconf |= F_ETHERTYPE;
 
 	if (mode & T4_FILTER_IP_PROTO)
 		fconf |= F_PROTOCOL;
 
 	if (mode & T4_FILTER_IP_TOS)
 		fconf |= F_TOS;
 
 	if (mode & T4_FILTER_VLAN)
 		fconf |= F_VLAN;
 
 	if (mode & T4_FILTER_VNIC)
 		fconf |= F_VNIC_ID;
 
 	if (mode & T4_FILTER_PORT)
 		fconf |= F_PORT;
 
 	if (mode & T4_FILTER_FCoE)
 		fconf |= F_FCOE;
 
 	return (fconf);
 }
 
 static uint32_t
 mode_to_iconf(uint32_t mode)
 {
 
 	if (mode & T4_FILTER_IC_VNIC)
 		return (F_VNIC);
 	return (0);
 }
 
 static int check_fspec_against_fconf_iconf(struct adapter *sc,
     struct t4_filter_specification *fs)
 {
 	struct tp_params *tpp = &sc->params.tp;
 	uint32_t fconf = 0;
 
 	if (fs->val.frag || fs->mask.frag)
 		fconf |= F_FRAGMENTATION;
 
 	if (fs->val.matchtype || fs->mask.matchtype)
 		fconf |= F_MPSHITTYPE;
 
 	if (fs->val.macidx || fs->mask.macidx)
 		fconf |= F_MACMATCH;
 
 	if (fs->val.ethtype || fs->mask.ethtype)
 		fconf |= F_ETHERTYPE;
 
 	if (fs->val.proto || fs->mask.proto)
 		fconf |= F_PROTOCOL;
 
 	if (fs->val.tos || fs->mask.tos)
 		fconf |= F_TOS;
 
 	if (fs->val.vlan_vld || fs->mask.vlan_vld)
 		fconf |= F_VLAN;
 
 	if (fs->val.ovlan_vld || fs->mask.ovlan_vld) {
 		fconf |= F_VNIC_ID;
 		if (tpp->ingress_config & F_VNIC)
 			return (EINVAL);
 	}
 
 	if (fs->val.pfvf_vld || fs->mask.pfvf_vld) {
 		fconf |= F_VNIC_ID;
 		if ((tpp->ingress_config & F_VNIC) == 0)
 			return (EINVAL);
 	}
 
 	if (fs->val.iport || fs->mask.iport)
 		fconf |= F_PORT;
 
 	if (fs->val.fcoe || fs->mask.fcoe)
 		fconf |= F_FCOE;
 
 	if ((tpp->vlan_pri_map | fconf) != tpp->vlan_pri_map)
 		return (E2BIG);
 
 	return (0);
 }
 
 static int
 get_filter_mode(struct adapter *sc, uint32_t *mode)
 {
 	struct tp_params *tpp = &sc->params.tp;
 
 	/*
 	 * We trust the cached values of the relevant TP registers.  This means
 	 * things work reliably only if writes to those registers are always via
 	 * t4_set_filter_mode.
 	 */
 	*mode = fconf_iconf_to_mode(tpp->vlan_pri_map, tpp->ingress_config);
 
 	return (0);
 }
 
 static int
 set_filter_mode(struct adapter *sc, uint32_t mode)
 {
 	struct tp_params *tpp = &sc->params.tp;
 	uint32_t fconf, iconf;
 	int rc;
 
 	iconf = mode_to_iconf(mode);
 	if ((iconf ^ tpp->ingress_config) & F_VNIC) {
 		/*
 		 * For now we just complain if A_TP_INGRESS_CONFIG is not
 		 * already set to the correct value for the requested filter
 		 * mode.  It's not clear if it's safe to write to this register
 		 * on the fly.  (And we trust the cached value of the register).
 		 */
 		return (EBUSY);
 	}
 
 	fconf = mode_to_fconf(mode);
 
 	rc = begin_synchronized_op(sc, NULL, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4setfm");
 	if (rc)
 		return (rc);
 
 	if (sc->tids.ftids_in_use > 0) {
 		rc = EBUSY;
 		goto done;
 	}
 
 #ifdef TCP_OFFLOAD
 	if (uld_active(sc, ULD_TOM)) {
 		rc = EBUSY;
 		goto done;
 	}
 #endif
 
 	rc = -t4_set_filter_mode(sc, fconf);
 done:
 	end_synchronized_op(sc, LOCK_HELD);
 	return (rc);
 }
 
 static inline uint64_t
 get_filter_hits(struct adapter *sc, uint32_t fid)
 {
 	uint32_t tcb_addr;
 
 	tcb_addr = t4_read_reg(sc, A_TP_CMM_TCB_BASE) +
 	    (fid + sc->tids.ftid_base) * TCB_SIZE;
 
 	if (is_t4(sc)) {
 		uint64_t hits;
 
 		read_via_memwin(sc, 0, tcb_addr + 16, (uint32_t *)&hits, 8);
 		return (be64toh(hits));
 	} else {
 		uint32_t hits;
 
 		read_via_memwin(sc, 0, tcb_addr + 24, &hits, 4);
 		return (be32toh(hits));
 	}
 }
 
 static int
 get_filter(struct adapter *sc, struct t4_filter *t)
 {
 	int i, rc, nfilters = sc->tids.nftids;
 	struct filter_entry *f;
 
 	rc = begin_synchronized_op(sc, NULL, HOLD_LOCK | SLEEP_OK | INTR_OK,
 	    "t4getf");
 	if (rc)
 		return (rc);
 
 	if (sc->tids.ftids_in_use == 0 || sc->tids.ftid_tab == NULL ||
 	    t->idx >= nfilters) {
 		t->idx = 0xffffffff;
 		goto done;
 	}
 
 	f = &sc->tids.ftid_tab[t->idx];
 	for (i = t->idx; i < nfilters; i++, f++) {
 		if (f->valid) {
 			t->idx = i;
 			t->l2tidx = f->l2t ? f->l2t->idx : 0;
 			t->smtidx = f->smtidx;
 			if (f->fs.hitcnts)
 				t->hits = get_filter_hits(sc, t->idx);
 			else
 				t->hits = UINT64_MAX;
 			t->fs = f->fs;
 
 			goto done;
 		}
 	}
 
 	t->idx = 0xffffffff;
 done:
 	end_synchronized_op(sc, LOCK_HELD);
 	return (0);
 }
 
 static int
 set_filter(struct adapter *sc, struct t4_filter *t)
 {
 	unsigned int nfilters, nports;
 	struct filter_entry *f;
 	int i, rc;
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4setf");
 	if (rc)
 		return (rc);
 
 	nfilters = sc->tids.nftids;
 	nports = sc->params.nports;
 
 	if (nfilters == 0) {
 		rc = ENOTSUP;
 		goto done;
 	}
 
 	if (t->idx >= nfilters) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	/* Validate against the global filter mode and ingress config */
 	rc = check_fspec_against_fconf_iconf(sc, &t->fs);
 	if (rc != 0)
 		goto done;
 
 	if (t->fs.action == FILTER_SWITCH && t->fs.eport >= nports) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	if (t->fs.val.iport >= nports) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	/* Can't specify an iq if not steering to it */
 	if (!t->fs.dirsteer && t->fs.iq) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	/* IPv6 filter idx must be 4 aligned */
 	if (t->fs.type == 1 &&
 	    ((t->idx & 0x3) || t->idx + 4 >= nfilters)) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	if (!(sc->flags & FULL_INIT_DONE) &&
 	    ((rc = adapter_full_init(sc)) != 0))
 		goto done;
 
 	if (sc->tids.ftid_tab == NULL) {
 		KASSERT(sc->tids.ftids_in_use == 0,
 		    ("%s: no memory allocated but filters_in_use > 0",
 		    __func__));
 
 		sc->tids.ftid_tab = malloc(sizeof (struct filter_entry) *
 		    nfilters, M_CXGBE, M_NOWAIT | M_ZERO);
 		if (sc->tids.ftid_tab == NULL) {
 			rc = ENOMEM;
 			goto done;
 		}
 		mtx_init(&sc->tids.ftid_lock, "T4 filters", 0, MTX_DEF);
 	}
 
 	for (i = 0; i < 4; i++) {
 		f = &sc->tids.ftid_tab[t->idx + i];
 
 		if (f->pending || f->valid) {
 			rc = EBUSY;
 			goto done;
 		}
 		if (f->locked) {
 			rc = EPERM;
 			goto done;
 		}
 
 		if (t->fs.type == 0)
 			break;
 	}
 
 	f = &sc->tids.ftid_tab[t->idx];
 	f->fs = t->fs;
 
 	rc = set_filter_wr(sc, t->idx);
 done:
 	end_synchronized_op(sc, 0);
 
 	if (rc == 0) {
 		mtx_lock(&sc->tids.ftid_lock);
 		for (;;) {
 			if (f->pending == 0) {
 				rc = f->valid ? 0 : EIO;
 				break;
 			}
 
 			if (mtx_sleep(&sc->tids.ftid_tab, &sc->tids.ftid_lock,
 			    PCATCH, "t4setfw", 0)) {
 				rc = EINPROGRESS;
 				break;
 			}
 		}
 		mtx_unlock(&sc->tids.ftid_lock);
 	}
 	return (rc);
 }
 
 static int
 del_filter(struct adapter *sc, struct t4_filter *t)
 {
 	unsigned int nfilters;
 	struct filter_entry *f;
 	int rc;
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4delf");
 	if (rc)
 		return (rc);
 
 	nfilters = sc->tids.nftids;
 
 	if (nfilters == 0) {
 		rc = ENOTSUP;
 		goto done;
 	}
 
 	if (sc->tids.ftid_tab == NULL || sc->tids.ftids_in_use == 0 ||
 	    t->idx >= nfilters) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	if (!(sc->flags & FULL_INIT_DONE)) {
 		rc = EAGAIN;
 		goto done;
 	}
 
 	f = &sc->tids.ftid_tab[t->idx];
 
 	if (f->pending) {
 		rc = EBUSY;
 		goto done;
 	}
 	if (f->locked) {
 		rc = EPERM;
 		goto done;
 	}
 
 	if (f->valid) {
 		t->fs = f->fs;	/* extra info for the caller */
 		rc = del_filter_wr(sc, t->idx);
 	}
 
 done:
 	end_synchronized_op(sc, 0);
 
 	if (rc == 0) {
 		mtx_lock(&sc->tids.ftid_lock);
 		for (;;) {
 			if (f->pending == 0) {
 				rc = f->valid ? EIO : 0;
 				break;
 			}
 
 			if (mtx_sleep(&sc->tids.ftid_tab, &sc->tids.ftid_lock,
 			    PCATCH, "t4delfw", 0)) {
 				rc = EINPROGRESS;
 				break;
 			}
 		}
 		mtx_unlock(&sc->tids.ftid_lock);
 	}
 
 	return (rc);
 }
 
 static void
 clear_filter(struct filter_entry *f)
 {
 	if (f->l2t)
 		t4_l2t_release(f->l2t);
 
 	bzero(f, sizeof (*f));
 }
 
 static int
 set_filter_wr(struct adapter *sc, int fidx)
 {
 	struct filter_entry *f = &sc->tids.ftid_tab[fidx];
 	struct fw_filter_wr *fwr;
 	unsigned int ftid, vnic_vld, vnic_vld_mask;
 	struct wrq_cookie cookie;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 
 	if (f->fs.newdmac || f->fs.newvlan) {
 		/* This filter needs an L2T entry; allocate one. */
 		f->l2t = t4_l2t_alloc_switching(sc->l2t);
 		if (f->l2t == NULL)
 			return (EAGAIN);
 		if (t4_l2t_set_switching(sc, f->l2t, f->fs.vlan, f->fs.eport,
 		    f->fs.dmac)) {
 			t4_l2t_release(f->l2t);
 			f->l2t = NULL;
 			return (ENOMEM);
 		}
 	}
 
 	/* Already validated against fconf, iconf */
 	MPASS((f->fs.val.pfvf_vld & f->fs.val.ovlan_vld) == 0);
 	MPASS((f->fs.mask.pfvf_vld & f->fs.mask.ovlan_vld) == 0);
 	if (f->fs.val.pfvf_vld || f->fs.val.ovlan_vld)
 		vnic_vld = 1;
 	else
 		vnic_vld = 0;
 	if (f->fs.mask.pfvf_vld || f->fs.mask.ovlan_vld)
 		vnic_vld_mask = 1;
 	else
 		vnic_vld_mask = 0;
 
 	ftid = sc->tids.ftid_base + fidx;
 
 	fwr = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*fwr), 16), &cookie);
 	if (fwr == NULL)
 		return (ENOMEM);
 	bzero(fwr, sizeof(*fwr));
 
 	fwr->op_pkd = htobe32(V_FW_WR_OP(FW_FILTER_WR));
 	fwr->len16_pkd = htobe32(FW_LEN16(*fwr));
 	fwr->tid_to_iq =
 	    htobe32(V_FW_FILTER_WR_TID(ftid) |
 		V_FW_FILTER_WR_RQTYPE(f->fs.type) |
 		V_FW_FILTER_WR_NOREPLY(0) |
 		V_FW_FILTER_WR_IQ(f->fs.iq));
 	fwr->del_filter_to_l2tix =
 	    htobe32(V_FW_FILTER_WR_RPTTID(f->fs.rpttid) |
 		V_FW_FILTER_WR_DROP(f->fs.action == FILTER_DROP) |
 		V_FW_FILTER_WR_DIRSTEER(f->fs.dirsteer) |
 		V_FW_FILTER_WR_MASKHASH(f->fs.maskhash) |
 		V_FW_FILTER_WR_DIRSTEERHASH(f->fs.dirsteerhash) |
 		V_FW_FILTER_WR_LPBK(f->fs.action == FILTER_SWITCH) |
 		V_FW_FILTER_WR_DMAC(f->fs.newdmac) |
 		V_FW_FILTER_WR_SMAC(f->fs.newsmac) |
 		V_FW_FILTER_WR_INSVLAN(f->fs.newvlan == VLAN_INSERT ||
 		    f->fs.newvlan == VLAN_REWRITE) |
 		V_FW_FILTER_WR_RMVLAN(f->fs.newvlan == VLAN_REMOVE ||
 		    f->fs.newvlan == VLAN_REWRITE) |
 		V_FW_FILTER_WR_HITCNTS(f->fs.hitcnts) |
 		V_FW_FILTER_WR_TXCHAN(f->fs.eport) |
 		V_FW_FILTER_WR_PRIO(f->fs.prio) |
 		V_FW_FILTER_WR_L2TIX(f->l2t ? f->l2t->idx : 0));
 	fwr->ethtype = htobe16(f->fs.val.ethtype);
 	fwr->ethtypem = htobe16(f->fs.mask.ethtype);
 	fwr->frag_to_ovlan_vldm =
 	    (V_FW_FILTER_WR_FRAG(f->fs.val.frag) |
 		V_FW_FILTER_WR_FRAGM(f->fs.mask.frag) |
 		V_FW_FILTER_WR_IVLAN_VLD(f->fs.val.vlan_vld) |
 		V_FW_FILTER_WR_OVLAN_VLD(vnic_vld) |
 		V_FW_FILTER_WR_IVLAN_VLDM(f->fs.mask.vlan_vld) |
 		V_FW_FILTER_WR_OVLAN_VLDM(vnic_vld_mask));
 	fwr->smac_sel = 0;
 	fwr->rx_chan_rx_rpl_iq = htobe16(V_FW_FILTER_WR_RX_CHAN(0) |
 	    V_FW_FILTER_WR_RX_RPL_IQ(sc->sge.fwq.abs_id));
 	fwr->maci_to_matchtypem =
 	    htobe32(V_FW_FILTER_WR_MACI(f->fs.val.macidx) |
 		V_FW_FILTER_WR_MACIM(f->fs.mask.macidx) |
 		V_FW_FILTER_WR_FCOE(f->fs.val.fcoe) |
 		V_FW_FILTER_WR_FCOEM(f->fs.mask.fcoe) |
 		V_FW_FILTER_WR_PORT(f->fs.val.iport) |
 		V_FW_FILTER_WR_PORTM(f->fs.mask.iport) |
 		V_FW_FILTER_WR_MATCHTYPE(f->fs.val.matchtype) |
 		V_FW_FILTER_WR_MATCHTYPEM(f->fs.mask.matchtype));
 	fwr->ptcl = f->fs.val.proto;
 	fwr->ptclm = f->fs.mask.proto;
 	fwr->ttyp = f->fs.val.tos;
 	fwr->ttypm = f->fs.mask.tos;
 	fwr->ivlan = htobe16(f->fs.val.vlan);
 	fwr->ivlanm = htobe16(f->fs.mask.vlan);
 	fwr->ovlan = htobe16(f->fs.val.vnic);
 	fwr->ovlanm = htobe16(f->fs.mask.vnic);
 	bcopy(f->fs.val.dip, fwr->lip, sizeof (fwr->lip));
 	bcopy(f->fs.mask.dip, fwr->lipm, sizeof (fwr->lipm));
 	bcopy(f->fs.val.sip, fwr->fip, sizeof (fwr->fip));
 	bcopy(f->fs.mask.sip, fwr->fipm, sizeof (fwr->fipm));
 	fwr->lp = htobe16(f->fs.val.dport);
 	fwr->lpm = htobe16(f->fs.mask.dport);
 	fwr->fp = htobe16(f->fs.val.sport);
 	fwr->fpm = htobe16(f->fs.mask.sport);
 	if (f->fs.newsmac)
 		bcopy(f->fs.smac, fwr->sma, sizeof (fwr->sma));
 
 	f->pending = 1;
 	sc->tids.ftids_in_use++;
 
 	commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie);
 	return (0);
 }
 
 static int
 del_filter_wr(struct adapter *sc, int fidx)
 {
 	struct filter_entry *f = &sc->tids.ftid_tab[fidx];
 	struct fw_filter_wr *fwr;
 	unsigned int ftid;
 	struct wrq_cookie cookie;
 
 	ftid = sc->tids.ftid_base + fidx;
 
 	fwr = start_wrq_wr(&sc->sge.mgmtq, howmany(sizeof(*fwr), 16), &cookie);
 	if (fwr == NULL)
 		return (ENOMEM);
 	bzero(fwr, sizeof (*fwr));
 
 	t4_mk_filtdelwr(ftid, fwr, sc->sge.fwq.abs_id);
 
 	f->pending = 1;
 	commit_wrq_wr(&sc->sge.mgmtq, fwr, &cookie);
 	return (0);
 }
 
 int
 t4_filter_rpl(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m)
 {
 	struct adapter *sc = iq->adapter;
 	const struct cpl_set_tcb_rpl *rpl = (const void *)(rss + 1);
 	unsigned int idx = GET_TID(rpl);
 	unsigned int rc;
 	struct filter_entry *f;
 
 	KASSERT(m == NULL, ("%s: payload with opcode %02x", __func__,
 	    rss->opcode));
 	MPASS(iq == &sc->sge.fwq);
 	MPASS(is_ftid(sc, idx));
 
 	idx -= sc->tids.ftid_base;
 	f = &sc->tids.ftid_tab[idx];
 	rc = G_COOKIE(rpl->cookie);
 
 	mtx_lock(&sc->tids.ftid_lock);
 	if (rc == FW_FILTER_WR_FLT_ADDED) {
 		KASSERT(f->pending, ("%s: filter[%u] isn't pending.",
 		    __func__, idx));
 		f->smtidx = (be64toh(rpl->oldval) >> 24) & 0xff;
 		f->pending = 0;  /* asynchronous setup completed */
 		f->valid = 1;
 	} else {
 		if (rc != FW_FILTER_WR_FLT_DELETED) {
 			/* Add or delete failed, display an error */
 			log(LOG_ERR,
 			    "filter %u setup failed with error %u\n",
 			    idx, rc);
 		}
 
 		clear_filter(f);
 		sc->tids.ftids_in_use--;
 	}
 	wakeup(&sc->tids.ftid_tab);
 	mtx_unlock(&sc->tids.ftid_lock);
 
 	return (0);
 }
 
 static int
 set_tcb_rpl(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m)
 {
 
 	MPASS(iq->set_tcb_rpl != NULL);
 	return (iq->set_tcb_rpl(iq, rss, m));
 }
 
 static int
 l2t_write_rpl(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m)
 {
 
 	MPASS(iq->l2t_write_rpl != NULL);
 	return (iq->l2t_write_rpl(iq, rss, m));
 }
 
 static int
 get_sge_context(struct adapter *sc, struct t4_sge_context *cntxt)
 {
 	int rc;
 
 	if (cntxt->cid > M_CTXTQID)
 		return (EINVAL);
 
 	if (cntxt->mem_id != CTXT_EGRESS && cntxt->mem_id != CTXT_INGRESS &&
 	    cntxt->mem_id != CTXT_FLM && cntxt->mem_id != CTXT_CNM)
 		return (EINVAL);
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4ctxt");
 	if (rc)
 		return (rc);
 
 	if (sc->flags & FW_OK) {
 		rc = -t4_sge_ctxt_rd(sc, sc->mbox, cntxt->cid, cntxt->mem_id,
 		    &cntxt->data[0]);
 		if (rc == 0)
 			goto done;
 	}
 
 	/*
 	 * Read via firmware failed or wasn't even attempted.  Read directly via
 	 * the backdoor.
 	 */
 	rc = -t4_sge_ctxt_rd_bd(sc, cntxt->cid, cntxt->mem_id, &cntxt->data[0]);
 done:
 	end_synchronized_op(sc, 0);
 	return (rc);
 }
 
 static int
 load_fw(struct adapter *sc, struct t4_data *fw)
 {
 	int rc;
 	uint8_t *fw_data;
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4ldfw");
 	if (rc)
 		return (rc);
 
 	if (sc->flags & FULL_INIT_DONE) {
 		rc = EBUSY;
 		goto done;
 	}
 
 	fw_data = malloc(fw->len, M_CXGBE, M_WAITOK);
 	if (fw_data == NULL) {
 		rc = ENOMEM;
 		goto done;
 	}
 
 	rc = copyin(fw->data, fw_data, fw->len);
 	if (rc == 0)
 		rc = -t4_load_fw(sc, fw_data, fw->len);
 
 	free(fw_data, M_CXGBE);
 done:
 	end_synchronized_op(sc, 0);
 	return (rc);
 }
 
 #define MAX_READ_BUF_SIZE (128 * 1024)
 static int
 read_card_mem(struct adapter *sc, int win, struct t4_mem_range *mr)
 {
 	uint32_t addr, remaining, n;
 	uint32_t *buf;
 	int rc;
 	uint8_t *dst;
 
 	rc = validate_mem_range(sc, mr->addr, mr->len);
 	if (rc != 0)
 		return (rc);
 
 	buf = malloc(min(mr->len, MAX_READ_BUF_SIZE), M_CXGBE, M_WAITOK);
 	addr = mr->addr;
 	remaining = mr->len;
 	dst = (void *)mr->data;
 
 	while (remaining) {
 		n = min(remaining, MAX_READ_BUF_SIZE);
 		read_via_memwin(sc, 2, addr, buf, n);
 
 		rc = copyout(buf, dst, n);
 		if (rc != 0)
 			break;
 
 		dst += n;
 		remaining -= n;
 		addr += n;
 	}
 
 	free(buf, M_CXGBE);
 	return (rc);
 }
 #undef MAX_READ_BUF_SIZE
 
 static int
 read_i2c(struct adapter *sc, struct t4_i2c_data *i2cd)
 {
 	int rc;
 
 	if (i2cd->len == 0 || i2cd->port_id >= sc->params.nports)
 		return (EINVAL);
 
 	if (i2cd->len > sizeof(i2cd->data))
 		return (EFBIG);
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4i2crd");
 	if (rc)
 		return (rc);
 	rc = -t4_i2c_rd(sc, sc->mbox, i2cd->port_id, i2cd->dev_addr,
 	    i2cd->offset, i2cd->len, &i2cd->data[0]);
 	end_synchronized_op(sc, 0);
 
 	return (rc);
 }
 
 static int
 in_range(int val, int lo, int hi)
 {
 
 	return (val < 0 || (val <= hi && val >= lo));
 }
 
 static int
 set_sched_class_config(struct adapter *sc, int minmax)
 {
 	int rc;
 
 	if (minmax < 0)
 		return (EINVAL);
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4sscc");
 	if (rc)
 		return (rc);
 	rc = -t4_sched_config(sc, FW_SCHED_TYPE_PKTSCHED, minmax, 1);
 	end_synchronized_op(sc, 0);
 
 	return (rc);
 }
 
 static int
 set_sched_class_params(struct adapter *sc, struct t4_sched_class_params *p,
     int sleep_ok)
 {
 	int rc, top_speed, fw_level, fw_mode, fw_rateunit, fw_ratemode;
 	struct port_info *pi;
 	struct tx_sched_class *tc;
 
 	if (p->level == SCHED_CLASS_LEVEL_CL_RL)
 		fw_level = FW_SCHED_PARAMS_LEVEL_CL_RL;
 	else if (p->level == SCHED_CLASS_LEVEL_CL_WRR)
 		fw_level = FW_SCHED_PARAMS_LEVEL_CL_WRR;
 	else if (p->level == SCHED_CLASS_LEVEL_CH_RL)
 		fw_level = FW_SCHED_PARAMS_LEVEL_CH_RL;
 	else
 		return (EINVAL);
 
 	if (p->mode == SCHED_CLASS_MODE_CLASS)
 		fw_mode = FW_SCHED_PARAMS_MODE_CLASS;
 	else if (p->mode == SCHED_CLASS_MODE_FLOW)
 		fw_mode = FW_SCHED_PARAMS_MODE_FLOW;
 	else
 		return (EINVAL);
 
 	if (p->rateunit == SCHED_CLASS_RATEUNIT_BITS)
 		fw_rateunit = FW_SCHED_PARAMS_UNIT_BITRATE;
 	else if (p->rateunit == SCHED_CLASS_RATEUNIT_PKTS)
 		fw_rateunit = FW_SCHED_PARAMS_UNIT_PKTRATE;
 	else
 		return (EINVAL);
 
 	if (p->ratemode == SCHED_CLASS_RATEMODE_REL)
 		fw_ratemode = FW_SCHED_PARAMS_RATE_REL;
 	else if (p->ratemode == SCHED_CLASS_RATEMODE_ABS)
 		fw_ratemode = FW_SCHED_PARAMS_RATE_ABS;
 	else
 		return (EINVAL);
 
 	/* Vet our parameters ... */
 	if (!in_range(p->channel, 0, sc->chip_params->nchan - 1))
 		return (ERANGE);
 
 	pi = sc->port[sc->chan_map[p->channel]];
 	if (pi == NULL)
 		return (ENXIO);
 	MPASS(pi->tx_chan == p->channel);
 	top_speed = port_top_speed(pi) * 1000000; /* Gbps -> Kbps */
 
 	if (!in_range(p->cl, 0, sc->chip_params->nsched_cls) ||
 	    !in_range(p->minrate, 0, top_speed) ||
 	    !in_range(p->maxrate, 0, top_speed) ||
 	    !in_range(p->weight, 0, 100))
 		return (ERANGE);
 
 	/*
 	 * Translate any unset parameters into the firmware's
 	 * nomenclature and/or fail the call if the parameters
 	 * are required ...
 	 */
 	if (p->rateunit < 0 || p->ratemode < 0 || p->channel < 0 || p->cl < 0)
 		return (EINVAL);
 
 	if (p->minrate < 0)
 		p->minrate = 0;
 	if (p->maxrate < 0) {
 		if (p->level == SCHED_CLASS_LEVEL_CL_RL ||
 		    p->level == SCHED_CLASS_LEVEL_CH_RL)
 			return (EINVAL);
 		else
 			p->maxrate = 0;
 	}
 	if (p->weight < 0) {
 		if (p->level == SCHED_CLASS_LEVEL_CL_WRR)
 			return (EINVAL);
 		else
 			p->weight = 0;
 	}
 	if (p->pktsize < 0) {
 		if (p->level == SCHED_CLASS_LEVEL_CL_RL ||
 		    p->level == SCHED_CLASS_LEVEL_CH_RL)
 			return (EINVAL);
 		else
 			p->pktsize = 0;
 	}
 
 	rc = begin_synchronized_op(sc, NULL,
 	    sleep_ok ? (SLEEP_OK | INTR_OK) : HOLD_LOCK, "t4sscp");
 	if (rc)
 		return (rc);
 	tc = &pi->tc[p->cl];
 	tc->params = *p;
 	rc = -t4_sched_params(sc, FW_SCHED_TYPE_PKTSCHED, fw_level, fw_mode,
 	    fw_rateunit, fw_ratemode, p->channel, p->cl, p->minrate, p->maxrate,
 	    p->weight, p->pktsize, sleep_ok);
 	if (rc == 0)
 		tc->flags |= TX_SC_OK;
 	else {
 		/*
 		 * Unknown state at this point, see tc->params for what was
 		 * attempted.
 		 */
 		tc->flags &= ~TX_SC_OK;
 	}
 	end_synchronized_op(sc, sleep_ok ? 0 : LOCK_HELD);
 
 	return (rc);
 }
 
 static int
 set_sched_class(struct adapter *sc, struct t4_sched_params *p)
 {
 
 	if (p->type != SCHED_CLASS_TYPE_PACKET)
 		return (EINVAL);
 
 	if (p->subcmd == SCHED_CLASS_SUBCMD_CONFIG)
 		return (set_sched_class_config(sc, p->u.config.minmax));
 
 	if (p->subcmd == SCHED_CLASS_SUBCMD_PARAMS)
 		return (set_sched_class_params(sc, &p->u.params, 1));
 
 	return (EINVAL);
 }
 
 static int
 set_sched_queue(struct adapter *sc, struct t4_sched_queue *p)
 {
 	struct port_info *pi = NULL;
 	struct vi_info *vi;
 	struct sge_txq *txq;
 	uint32_t fw_mnem, fw_queue, fw_class;
 	int i, rc;
 
 	rc = begin_synchronized_op(sc, NULL, SLEEP_OK | INTR_OK, "t4setsq");
 	if (rc)
 		return (rc);
 
 	if (p->port >= sc->params.nports) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	/* XXX: Only supported for the main VI. */
 	pi = sc->port[p->port];
 	vi = &pi->vi[0];
 	if (!(vi->flags & VI_INIT_DONE)) {
 		/* tx queues not set up yet */
 		rc = EAGAIN;
 		goto done;
 	}
 
 	if (!in_range(p->queue, 0, vi->ntxq - 1) ||
 	    !in_range(p->cl, 0, sc->chip_params->nsched_cls - 1)) {
 		rc = EINVAL;
 		goto done;
 	}
 
 	/*
 	 * Create a template for the FW_PARAMS_CMD mnemonic and value (TX
 	 * Scheduling Class in this case).
 	 */
 	fw_mnem = (V_FW_PARAMS_MNEM(FW_PARAMS_MNEM_DMAQ) |
 	    V_FW_PARAMS_PARAM_X(FW_PARAMS_PARAM_DMAQ_EQ_SCHEDCLASS_ETH));
 	fw_class = p->cl < 0 ? 0xffffffff : p->cl;
 
 	/*
 	 * If op.queue is non-negative, then we're only changing the scheduling
 	 * on a single specified TX queue.
 	 */
 	if (p->queue >= 0) {
 		txq = &sc->sge.txq[vi->first_txq + p->queue];
 		fw_queue = (fw_mnem | V_FW_PARAMS_PARAM_YZ(txq->eq.cntxt_id));
 		rc = -t4_set_params(sc, sc->mbox, sc->pf, 0, 1, &fw_queue,
 		    &fw_class);
 		goto done;
 	}
 
 	/*
 	 * Change the scheduling on all the TX queues for the
 	 * interface.
 	 */
 	for_each_txq(vi, i, txq) {
 		fw_queue = (fw_mnem | V_FW_PARAMS_PARAM_YZ(txq->eq.cntxt_id));
 		rc = -t4_set_params(sc, sc->mbox, sc->pf, 0, 1, &fw_queue,
 		    &fw_class);
 		if (rc)
 			goto done;
 	}
 
 	rc = 0;
 done:
 	end_synchronized_op(sc, 0);
 	return (rc);
 }
 
 int
 t4_os_find_pci_capability(struct adapter *sc, int cap)
 {
 	int i;
 
 	return (pci_find_cap(sc->dev, cap, &i) == 0 ? i : 0);
 }
 
 int
 t4_os_pci_save_state(struct adapter *sc)
 {
 	device_t dev;
 	struct pci_devinfo *dinfo;
 
 	dev = sc->dev;
 	dinfo = device_get_ivars(dev);
 
 	pci_cfg_save(dev, dinfo, 0);
 	return (0);
 }
 
 int
 t4_os_pci_restore_state(struct adapter *sc)
 {
 	device_t dev;
 	struct pci_devinfo *dinfo;
 
 	dev = sc->dev;
 	dinfo = device_get_ivars(dev);
 
 	pci_cfg_restore(dev, dinfo);
 	return (0);
 }
 
 void
 t4_os_portmod_changed(const struct adapter *sc, int idx)
 {
 	struct port_info *pi = sc->port[idx];
 	struct vi_info *vi;
 	struct ifnet *ifp;
 	int v;
 	static const char *mod_str[] = {
 		NULL, "LR", "SR", "ER", "TWINAX", "active TWINAX", "LRM"
 	};
 
 	for_each_vi(pi, v, vi) {
 		build_medialist(pi, &vi->media);
 	}
 
 	ifp = pi->vi[0].ifp;
 	if (pi->mod_type == FW_PORT_MOD_TYPE_NONE)
 		if_printf(ifp, "transceiver unplugged.\n");
 	else if (pi->mod_type == FW_PORT_MOD_TYPE_UNKNOWN)
 		if_printf(ifp, "unknown transceiver inserted.\n");
 	else if (pi->mod_type == FW_PORT_MOD_TYPE_NOTSUPPORTED)
 		if_printf(ifp, "unsupported transceiver inserted.\n");
 	else if (pi->mod_type > 0 && pi->mod_type < nitems(mod_str)) {
 		if_printf(ifp, "%s transceiver inserted.\n",
 		    mod_str[pi->mod_type]);
 	} else {
 		if_printf(ifp, "transceiver (type %d) inserted.\n",
 		    pi->mod_type);
 	}
 }
 
 void
 t4_os_link_changed(struct adapter *sc, int idx, int link_stat, int reason)
 {
 	struct port_info *pi = sc->port[idx];
 	struct vi_info *vi;
 	struct ifnet *ifp;
 	int v;
 
 	if (link_stat)
 		pi->linkdnrc = -1;
 	else {
 		if (reason >= 0)
 			pi->linkdnrc = reason;
 	}
 	for_each_vi(pi, v, vi) {
 		ifp = vi->ifp;
 		if (ifp == NULL)
 			continue;
 
 		if (link_stat) {
 			ifp->if_baudrate = IF_Mbps(pi->link_cfg.speed);
 			if_link_state_change(ifp, LINK_STATE_UP);
 		} else {
 			if_link_state_change(ifp, LINK_STATE_DOWN);
 		}
 	}
 }
 
 void
 t4_iterate(void (*func)(struct adapter *, void *), void *arg)
 {
 	struct adapter *sc;
 
 	sx_slock(&t4_list_lock);
 	SLIST_FOREACH(sc, &t4_list, link) {
 		/*
 		 * func should not make any assumptions about what state sc is
 		 * in - the only guarantee is that sc->sc_lock is a valid lock.
 		 */
 		func(sc, arg);
 	}
 	sx_sunlock(&t4_list_lock);
 }
 
 static int
 t4_ioctl(struct cdev *dev, unsigned long cmd, caddr_t data, int fflag,
     struct thread *td)
 {
 	int rc;
 	struct adapter *sc = dev->si_drv1;
 
 	rc = priv_check(td, PRIV_DRIVER);
 	if (rc != 0)
 		return (rc);
 
 	switch (cmd) {
 	case CHELSIO_T4_GETREG: {
 		struct t4_reg *edata = (struct t4_reg *)data;
 
 		if ((edata->addr & 0x3) != 0 || edata->addr >= sc->mmio_len)
 			return (EFAULT);
 
 		if (edata->size == 4)
 			edata->val = t4_read_reg(sc, edata->addr);
 		else if (edata->size == 8)
 			edata->val = t4_read_reg64(sc, edata->addr);
 		else
 			return (EINVAL);
 
 		break;
 	}
 	case CHELSIO_T4_SETREG: {
 		struct t4_reg *edata = (struct t4_reg *)data;
 
 		if ((edata->addr & 0x3) != 0 || edata->addr >= sc->mmio_len)
 			return (EFAULT);
 
 		if (edata->size == 4) {
 			if (edata->val & 0xffffffff00000000)
 				return (EINVAL);
 			t4_write_reg(sc, edata->addr, (uint32_t) edata->val);
 		} else if (edata->size == 8)
 			t4_write_reg64(sc, edata->addr, edata->val);
 		else
 			return (EINVAL);
 		break;
 	}
 	case CHELSIO_T4_REGDUMP: {
 		struct t4_regdump *regs = (struct t4_regdump *)data;
 		int reglen = is_t4(sc) ? T4_REGDUMP_SIZE : T5_REGDUMP_SIZE;
 		uint8_t *buf;
 
 		if (regs->len < reglen) {
 			regs->len = reglen; /* hint to the caller */
 			return (ENOBUFS);
 		}
 
 		regs->len = reglen;
 		buf = malloc(reglen, M_CXGBE, M_WAITOK | M_ZERO);
 		get_regs(sc, regs, buf);
 		rc = copyout(buf, regs->data, reglen);
 		free(buf, M_CXGBE);
 		break;
 	}
 	case CHELSIO_T4_GET_FILTER_MODE:
 		rc = get_filter_mode(sc, (uint32_t *)data);
 		break;
 	case CHELSIO_T4_SET_FILTER_MODE:
 		rc = set_filter_mode(sc, *(uint32_t *)data);
 		break;
 	case CHELSIO_T4_GET_FILTER:
 		rc = get_filter(sc, (struct t4_filter *)data);
 		break;
 	case CHELSIO_T4_SET_FILTER:
 		rc = set_filter(sc, (struct t4_filter *)data);
 		break;
 	case CHELSIO_T4_DEL_FILTER:
 		rc = del_filter(sc, (struct t4_filter *)data);
 		break;
 	case CHELSIO_T4_GET_SGE_CONTEXT:
 		rc = get_sge_context(sc, (struct t4_sge_context *)data);
 		break;
 	case CHELSIO_T4_LOAD_FW:
 		rc = load_fw(sc, (struct t4_data *)data);
 		break;
 	case CHELSIO_T4_GET_MEM:
 		rc = read_card_mem(sc, 2, (struct t4_mem_range *)data);
 		break;
 	case CHELSIO_T4_GET_I2C:
 		rc = read_i2c(sc, (struct t4_i2c_data *)data);
 		break;
 	case CHELSIO_T4_CLEAR_STATS: {
 		int i, v;
 		u_int port_id = *(uint32_t *)data;
 		struct port_info *pi;
 		struct vi_info *vi;
 
 		if (port_id >= sc->params.nports)
 			return (EINVAL);
 		pi = sc->port[port_id];
 
 		/* MAC stats */
 		t4_clr_port_stats(sc, pi->tx_chan);
 		pi->tx_parse_error = 0;
 		mtx_lock(&sc->reg_lock);
 		for_each_vi(pi, v, vi) {
 			if (vi->flags & VI_INIT_DONE)
 				t4_clr_vi_stats(sc, vi->viid);
 		}
 		mtx_unlock(&sc->reg_lock);
 
 		/*
 		 * Since this command accepts a port, clear stats for
 		 * all VIs on this port.
 		 */
 		for_each_vi(pi, v, vi) {
 			if (vi->flags & VI_INIT_DONE) {
 				struct sge_rxq *rxq;
 				struct sge_txq *txq;
 				struct sge_wrq *wrq;
 
 				for_each_rxq(vi, i, rxq) {
 #if defined(INET) || defined(INET6)
 					rxq->lro.lro_queued = 0;
 					rxq->lro.lro_flushed = 0;
 #endif
 					rxq->rxcsum = 0;
 					rxq->vlan_extraction = 0;
 				}
 
 				for_each_txq(vi, i, txq) {
 					txq->txcsum = 0;
 					txq->tso_wrs = 0;
 					txq->vlan_insertion = 0;
 					txq->imm_wrs = 0;
 					txq->sgl_wrs = 0;
 					txq->txpkt_wrs = 0;
 					txq->txpkts0_wrs = 0;
 					txq->txpkts1_wrs = 0;
 					txq->txpkts0_pkts = 0;
 					txq->txpkts1_pkts = 0;
 					mp_ring_reset_stats(txq->r);
 				}
 
 #ifdef TCP_OFFLOAD
 				/* nothing to clear for each ofld_rxq */
 
 				for_each_ofld_txq(vi, i, wrq) {
 					wrq->tx_wrs_direct = 0;
 					wrq->tx_wrs_copied = 0;
 				}
 #endif
 
 				if (IS_MAIN_VI(vi)) {
 					wrq = &sc->sge.ctrlq[pi->port_id];
 					wrq->tx_wrs_direct = 0;
 					wrq->tx_wrs_copied = 0;
 				}
 			}
 		}
 		break;
 	}
 	case CHELSIO_T4_SCHED_CLASS:
 		rc = set_sched_class(sc, (struct t4_sched_params *)data);
 		break;
 	case CHELSIO_T4_SCHED_QUEUE:
 		rc = set_sched_queue(sc, (struct t4_sched_queue *)data);
 		break;
 	case CHELSIO_T4_GET_TRACER:
 		rc = t4_get_tracer(sc, (struct t4_tracer *)data);
 		break;
 	case CHELSIO_T4_SET_TRACER:
 		rc = t4_set_tracer(sc, (struct t4_tracer *)data);
 		break;
 	default:
 		rc = ENOTTY;
 	}
 
 	return (rc);
 }
 
 void
 t4_db_full(struct adapter *sc)
 {
 
 	CXGBE_UNIMPLEMENTED(__func__);
 }
 
 void
 t4_db_dropped(struct adapter *sc)
 {
 
 	CXGBE_UNIMPLEMENTED(__func__);
 }
 
 #ifdef TCP_OFFLOAD
 void
 t4_iscsi_init(struct adapter *sc, u_int tag_mask, const u_int *pgsz_order)
 {
 
 	t4_write_reg(sc, A_ULP_RX_ISCSI_TAGMASK, tag_mask);
 	t4_write_reg(sc, A_ULP_RX_ISCSI_PSZ, V_HPZ0(pgsz_order[0]) |
 		V_HPZ1(pgsz_order[1]) | V_HPZ2(pgsz_order[2]) |
 		V_HPZ3(pgsz_order[3]));
 }
 
 static int
 toe_capability(struct vi_info *vi, int enable)
 {
 	int rc;
 	struct port_info *pi = vi->pi;
 	struct adapter *sc = pi->adapter;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 
 	if (!is_offload(sc))
 		return (ENODEV);
 
 	if (enable) {
 		if ((vi->ifp->if_capenable & IFCAP_TOE) != 0) {
 			/* TOE is already enabled. */
 			return (0);
 		}
 
 		/*
 		 * We need the port's queues around so that we're able to send
 		 * and receive CPLs to/from the TOE even if the ifnet for this
 		 * port has never been UP'd administratively.
 		 */
 		if (!(vi->flags & VI_INIT_DONE)) {
 			rc = vi_full_init(vi);
 			if (rc)
 				return (rc);
 		}
 		if (!(pi->vi[0].flags & VI_INIT_DONE)) {
 			rc = vi_full_init(&pi->vi[0]);
 			if (rc)
 				return (rc);
 		}
 
 		if (isset(&sc->offload_map, pi->port_id)) {
 			/* TOE is enabled on another VI of this port. */
 			pi->uld_vis++;
 			return (0);
 		}
 
 		if (!uld_active(sc, ULD_TOM)) {
 			rc = t4_activate_uld(sc, ULD_TOM);
 			if (rc == EAGAIN) {
 				log(LOG_WARNING,
 				    "You must kldload t4_tom.ko before trying "
 				    "to enable TOE on a cxgbe interface.\n");
 			}
 			if (rc != 0)
 				return (rc);
 			KASSERT(sc->tom_softc != NULL,
 			    ("%s: TOM activated but softc NULL", __func__));
 			KASSERT(uld_active(sc, ULD_TOM),
 			    ("%s: TOM activated but flag not set", __func__));
 		}
 
 		/* Activate iWARP and iSCSI too, if the modules are loaded. */
 		if (!uld_active(sc, ULD_IWARP))
 			(void) t4_activate_uld(sc, ULD_IWARP);
 		if (!uld_active(sc, ULD_ISCSI))
 			(void) t4_activate_uld(sc, ULD_ISCSI);
 
 		pi->uld_vis++;
 		setbit(&sc->offload_map, pi->port_id);
 	} else {
 		pi->uld_vis--;
 
 		if (!isset(&sc->offload_map, pi->port_id) || pi->uld_vis > 0)
 			return (0);
 
 		KASSERT(uld_active(sc, ULD_TOM),
 		    ("%s: TOM never initialized?", __func__));
 		clrbit(&sc->offload_map, pi->port_id);
 	}
 
 	return (0);
 }
 
 /*
  * Add an upper layer driver to the global list.
  */
 int
 t4_register_uld(struct uld_info *ui)
 {
 	int rc = 0;
 	struct uld_info *u;
 
 	sx_xlock(&t4_uld_list_lock);
 	SLIST_FOREACH(u, &t4_uld_list, link) {
 	    if (u->uld_id == ui->uld_id) {
 		    rc = EEXIST;
 		    goto done;
 	    }
 	}
 
 	SLIST_INSERT_HEAD(&t4_uld_list, ui, link);
 	ui->refcount = 0;
 done:
 	sx_xunlock(&t4_uld_list_lock);
 	return (rc);
 }
 
 int
 t4_unregister_uld(struct uld_info *ui)
 {
 	int rc = EINVAL;
 	struct uld_info *u;
 
 	sx_xlock(&t4_uld_list_lock);
 
 	SLIST_FOREACH(u, &t4_uld_list, link) {
 	    if (u == ui) {
 		    if (ui->refcount > 0) {
 			    rc = EBUSY;
 			    goto done;
 		    }
 
 		    SLIST_REMOVE(&t4_uld_list, ui, uld_info, link);
 		    rc = 0;
 		    goto done;
 	    }
 	}
 done:
 	sx_xunlock(&t4_uld_list_lock);
 	return (rc);
 }
 
 int
 t4_activate_uld(struct adapter *sc, int id)
 {
 	int rc;
 	struct uld_info *ui;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 
 	if (id < 0 || id > ULD_MAX)
 		return (EINVAL);
 	rc = EAGAIN;	/* kldoad the module with this ULD and try again. */
 
 	sx_slock(&t4_uld_list_lock);
 
 	SLIST_FOREACH(ui, &t4_uld_list, link) {
 		if (ui->uld_id == id) {
 			if (!(sc->flags & FULL_INIT_DONE)) {
 				rc = adapter_full_init(sc);
 				if (rc != 0)
 					break;
 			}
 
 			rc = ui->activate(sc);
 			if (rc == 0) {
 				setbit(&sc->active_ulds, id);
 				ui->refcount++;
 			}
 			break;
 		}
 	}
 
 	sx_sunlock(&t4_uld_list_lock);
 
 	return (rc);
 }
 
 int
 t4_deactivate_uld(struct adapter *sc, int id)
 {
 	int rc;
 	struct uld_info *ui;
 
 	ASSERT_SYNCHRONIZED_OP(sc);
 
 	if (id < 0 || id > ULD_MAX)
 		return (EINVAL);
 	rc = ENXIO;
 
 	sx_slock(&t4_uld_list_lock);
 
 	SLIST_FOREACH(ui, &t4_uld_list, link) {
 		if (ui->uld_id == id) {
 			rc = ui->deactivate(sc);
 			if (rc == 0) {
 				clrbit(&sc->active_ulds, id);
 				ui->refcount--;
 			}
 			break;
 		}
 	}
 
 	sx_sunlock(&t4_uld_list_lock);
 
 	return (rc);
 }
 
 int
 uld_active(struct adapter *sc, int uld_id)
 {
 
 	MPASS(uld_id >= 0 && uld_id <= ULD_MAX);
 
 	return (isset(&sc->active_ulds, uld_id));
 }
 #endif
 
 /*
  * Come up with reasonable defaults for some of the tunables, provided they're
  * not set by the user (in which case we'll use the values as is).
  */
 static void
 tweak_tunables(void)
 {
 	int nc = mp_ncpus;	/* our snapshot of the number of CPUs */
 
 	if (t4_ntxq10g < 1) {
 #ifdef RSS
 		t4_ntxq10g = rss_getnumbuckets();
 #else
 		t4_ntxq10g = min(nc, NTXQ_10G);
 #endif
 	}
 
 	if (t4_ntxq1g < 1) {
 #ifdef RSS
 		/* XXX: way too many for 1GbE? */
 		t4_ntxq1g = rss_getnumbuckets();
 #else
 		t4_ntxq1g = min(nc, NTXQ_1G);
 #endif
 	}
 
 	if (t4_ntxq_vi < 1)
 		t4_ntxq_vi = min(nc, NTXQ_VI);
 
 	if (t4_nrxq10g < 1) {
 #ifdef RSS
 		t4_nrxq10g = rss_getnumbuckets();
 #else
 		t4_nrxq10g = min(nc, NRXQ_10G);
 #endif
 	}
 
 	if (t4_nrxq1g < 1) {
 #ifdef RSS
 		/* XXX: way too many for 1GbE? */
 		t4_nrxq1g = rss_getnumbuckets();
 #else
 		t4_nrxq1g = min(nc, NRXQ_1G);
 #endif
 	}
 
 	if (t4_nrxq_vi < 1)
 		t4_nrxq_vi = min(nc, NRXQ_VI);
 
 #ifdef TCP_OFFLOAD
 	if (t4_nofldtxq10g < 1)
 		t4_nofldtxq10g = min(nc, NOFLDTXQ_10G);
 
 	if (t4_nofldtxq1g < 1)
 		t4_nofldtxq1g = min(nc, NOFLDTXQ_1G);
 
 	if (t4_nofldtxq_vi < 1)
 		t4_nofldtxq_vi = min(nc, NOFLDTXQ_VI);
 
 	if (t4_nofldrxq10g < 1)
 		t4_nofldrxq10g = min(nc, NOFLDRXQ_10G);
 
 	if (t4_nofldrxq1g < 1)
 		t4_nofldrxq1g = min(nc, NOFLDRXQ_1G);
 
 	if (t4_nofldrxq_vi < 1)
 		t4_nofldrxq_vi = min(nc, NOFLDRXQ_VI);
 
 	if (t4_toecaps_allowed == -1)
 		t4_toecaps_allowed = FW_CAPS_CONFIG_TOE;
 
 	if (t4_rdmacaps_allowed == -1) {
 		t4_rdmacaps_allowed = FW_CAPS_CONFIG_RDMA_RDDP |
 		    FW_CAPS_CONFIG_RDMA_RDMAC;
 	}
 
 	if (t4_iscsicaps_allowed == -1) {
 		t4_iscsicaps_allowed = FW_CAPS_CONFIG_ISCSI_INITIATOR_PDU |
 		    FW_CAPS_CONFIG_ISCSI_TARGET_PDU |
 		    FW_CAPS_CONFIG_ISCSI_T10DIF;
 	}
 #else
 	if (t4_toecaps_allowed == -1)
 		t4_toecaps_allowed = 0;
 
 	if (t4_rdmacaps_allowed == -1)
 		t4_rdmacaps_allowed = 0;
 
 	if (t4_iscsicaps_allowed == -1)
 		t4_iscsicaps_allowed = 0;
 #endif
 
 #ifdef DEV_NETMAP
 	if (t4_nnmtxq_vi < 1)
 		t4_nnmtxq_vi = min(nc, NNMTXQ_VI);
 
 	if (t4_nnmrxq_vi < 1)
 		t4_nnmrxq_vi = min(nc, NNMRXQ_VI);
 #endif
 
 	if (t4_tmr_idx_10g < 0 || t4_tmr_idx_10g >= SGE_NTIMERS)
 		t4_tmr_idx_10g = TMR_IDX_10G;
 
 	if (t4_pktc_idx_10g < -1 || t4_pktc_idx_10g >= SGE_NCOUNTERS)
 		t4_pktc_idx_10g = PKTC_IDX_10G;
 
 	if (t4_tmr_idx_1g < 0 || t4_tmr_idx_1g >= SGE_NTIMERS)
 		t4_tmr_idx_1g = TMR_IDX_1G;
 
 	if (t4_pktc_idx_1g < -1 || t4_pktc_idx_1g >= SGE_NCOUNTERS)
 		t4_pktc_idx_1g = PKTC_IDX_1G;
 
 	if (t4_qsize_txq < 128)
 		t4_qsize_txq = 128;
 
 	if (t4_qsize_rxq < 128)
 		t4_qsize_rxq = 128;
 	while (t4_qsize_rxq & 7)
 		t4_qsize_rxq++;
 
 	t4_intr_types &= INTR_MSIX | INTR_MSI | INTR_INTX;
 }
 
 #ifdef DDB
 static void
 t4_dump_tcb(struct adapter *sc, int tid)
 {
 	uint32_t base, i, j, off, pf, reg, save, tcb_addr, win_pos;
 
 	reg = PCIE_MEM_ACCESS_REG(A_PCIE_MEM_ACCESS_OFFSET, 2);
 	save = t4_read_reg(sc, reg);
 	base = sc->memwin[2].mw_base;
 
 	/* Dump TCB for the tid */
 	tcb_addr = t4_read_reg(sc, A_TP_CMM_TCB_BASE);
 	tcb_addr += tid * TCB_SIZE;
 
 	if (is_t4(sc)) {
 		pf = 0;
 		win_pos = tcb_addr & ~0xf;	/* start must be 16B aligned */
 	} else {
 		pf = V_PFNUM(sc->pf);
 		win_pos = tcb_addr & ~0x7f;	/* start must be 128B aligned */
 	}
 	t4_write_reg(sc, reg, win_pos | pf);
 	t4_read_reg(sc, reg);
 
 	off = tcb_addr - win_pos;
 	for (i = 0; i < 4; i++) {
 		uint32_t buf[8];
 		for (j = 0; j < 8; j++, off += 4)
 			buf[j] = htonl(t4_read_reg(sc, base + off));
 
 		db_printf("%08x %08x %08x %08x %08x %08x %08x %08x\n",
 		    buf[0], buf[1], buf[2], buf[3], buf[4], buf[5], buf[6],
 		    buf[7]);
 	}
 
 	t4_write_reg(sc, reg, save);
 	t4_read_reg(sc, reg);
 }
 
 static void
 t4_dump_devlog(struct adapter *sc)
 {
 	struct devlog_params *dparams = &sc->params.devlog;
 	struct fw_devlog_e e;
 	int i, first, j, m, nentries, rc;
 	uint64_t ftstamp = UINT64_MAX;
 
 	if (dparams->start == 0) {
 		db_printf("devlog params not valid\n");
 		return;
 	}
 
 	nentries = dparams->size / sizeof(struct fw_devlog_e);
 	m = fwmtype_to_hwmtype(dparams->memtype);
 
 	/* Find the first entry. */
 	first = -1;
 	for (i = 0; i < nentries && !db_pager_quit; i++) {
 		rc = -t4_mem_read(sc, m, dparams->start + i * sizeof(e),
 		    sizeof(e), (void *)&e);
 		if (rc != 0)
 			break;
 
 		if (e.timestamp == 0)
 			break;
 
 		e.timestamp = be64toh(e.timestamp);
 		if (e.timestamp < ftstamp) {
 			ftstamp = e.timestamp;
 			first = i;
 		}
 	}
 
 	if (first == -1)
 		return;
 
 	i = first;
 	do {
 		rc = -t4_mem_read(sc, m, dparams->start + i * sizeof(e),
 		    sizeof(e), (void *)&e);
 		if (rc != 0)
 			return;
 
 		if (e.timestamp == 0)
 			return;
 
 		e.timestamp = be64toh(e.timestamp);
 		e.seqno = be32toh(e.seqno);
 		for (j = 0; j < 8; j++)
 			e.params[j] = be32toh(e.params[j]);
 
 		db_printf("%10d  %15ju  %8s  %8s  ",
 		    e.seqno, e.timestamp,
 		    (e.level < nitems(devlog_level_strings) ?
 			devlog_level_strings[e.level] : "UNKNOWN"),
 		    (e.facility < nitems(devlog_facility_strings) ?
 			devlog_facility_strings[e.facility] : "UNKNOWN"));
 		db_printf(e.fmt, e.params[0], e.params[1], e.params[2],
 		    e.params[3], e.params[4], e.params[5], e.params[6],
 		    e.params[7]);
 
 		if (++i == nentries)
 			i = 0;
 	} while (i != first && !db_pager_quit);
 }
 
 static struct command_table db_t4_table = LIST_HEAD_INITIALIZER(db_t4_table);
 _DB_SET(_show, t4, NULL, db_show_table, 0, &db_t4_table);
 
 DB_FUNC(devlog, db_show_devlog, db_t4_table, CS_OWN, NULL)
 {
 	device_t dev;
 	int t;
 	bool valid;
 
 	valid = false;
 	t = db_read_token();
 	if (t == tIDENT) {
 		dev = device_lookup_by_name(db_tok_string);
 		valid = true;
 	}
 	db_skip_to_eol();
 	if (!valid) {
 		db_printf("usage: show t4 devlog <nexus>\n");
 		return;
 	}
 
 	if (dev == NULL) {
 		db_printf("device not found\n");
 		return;
 	}
 
 	t4_dump_devlog(device_get_softc(dev));
 }
 
 DB_FUNC(tcb, db_show_t4tcb, db_t4_table, CS_OWN, NULL)
 {
 	device_t dev;
 	int radix, tid, t;
 	bool valid;
 
 	valid = false;
 	radix = db_radix;
 	db_radix = 10;
 	t = db_read_token();
 	if (t == tIDENT) {
 		dev = device_lookup_by_name(db_tok_string);
 		t = db_read_token();
 		if (t == tNUMBER) {
 			tid = db_tok_number;
 			valid = true;
 		}
 	}	
 	db_radix = radix;
 	db_skip_to_eol();
 	if (!valid) {
 		db_printf("usage: show t4 tcb <nexus> <tid>\n");
 		return;
 	}
 
 	if (dev == NULL) {
 		db_printf("device not found\n");
 		return;
 	}
 	if (tid < 0) {
 		db_printf("invalid tid\n");
 		return;
 	}
 
 	t4_dump_tcb(device_get_softc(dev), tid);
 }
 #endif
 
 static struct sx mlu;	/* mod load unload */
 SX_SYSINIT(cxgbe_mlu, &mlu, "cxgbe mod load/unload");
 
 static int
 mod_event(module_t mod, int cmd, void *arg)
 {
 	int rc = 0;
 	static int loaded = 0;
 
 	switch (cmd) {
 	case MOD_LOAD:
 		sx_xlock(&mlu);
 		if (loaded++ == 0) {
 			t4_sge_modload();
 			t4_register_cpl_handler(CPL_SET_TCB_RPL, set_tcb_rpl);
 			t4_register_cpl_handler(CPL_L2T_WRITE_RPL, l2t_write_rpl);
 			t4_register_cpl_handler(CPL_TRACE_PKT, t4_trace_pkt);
 			t4_register_cpl_handler(CPL_T5_TRACE_PKT, t5_trace_pkt);
 			sx_init(&t4_list_lock, "T4/T5 adapters");
 			SLIST_INIT(&t4_list);
 #ifdef TCP_OFFLOAD
 			sx_init(&t4_uld_list_lock, "T4/T5 ULDs");
 			SLIST_INIT(&t4_uld_list);
 #endif
 			t4_tracer_modload();
 			tweak_tunables();
 		}
 		sx_xunlock(&mlu);
 		break;
 
 	case MOD_UNLOAD:
 		sx_xlock(&mlu);
 		if (--loaded == 0) {
 			int tries;
 
 			sx_slock(&t4_list_lock);
 			if (!SLIST_EMPTY(&t4_list)) {
 				rc = EBUSY;
 				sx_sunlock(&t4_list_lock);
 				goto done_unload;
 			}
 #ifdef TCP_OFFLOAD
 			sx_slock(&t4_uld_list_lock);
 			if (!SLIST_EMPTY(&t4_uld_list)) {
 				rc = EBUSY;
 				sx_sunlock(&t4_uld_list_lock);
 				sx_sunlock(&t4_list_lock);
 				goto done_unload;
 			}
 #endif
 			tries = 0;
 			while (tries++ < 5 && t4_sge_extfree_refs() != 0) {
 				uprintf("%ju clusters with custom free routine "
 				    "still is use.\n", t4_sge_extfree_refs());
 				pause("t4unload", 2 * hz);
 			}
 #ifdef TCP_OFFLOAD
 			sx_sunlock(&t4_uld_list_lock);
 #endif
 			sx_sunlock(&t4_list_lock);
 
 			if (t4_sge_extfree_refs() == 0) {
 				t4_tracer_modunload();
 #ifdef TCP_OFFLOAD
 				sx_destroy(&t4_uld_list_lock);
 #endif
 				sx_destroy(&t4_list_lock);
 				t4_sge_modunload();
 				loaded = 0;
 			} else {
 				rc = EBUSY;
 				loaded++;	/* undo earlier decrement */
 			}
 		}
 done_unload:
 		sx_xunlock(&mlu);
 		break;
 	}
 
 	return (rc);
 }
 
 static devclass_t t4_devclass, t5_devclass;
 static devclass_t cxgbe_devclass, cxl_devclass;
 static devclass_t vcxgbe_devclass, vcxl_devclass;
 
 DRIVER_MODULE(t4nex, pci, t4_driver, t4_devclass, mod_event, 0);
 MODULE_VERSION(t4nex, 1);
 MODULE_DEPEND(t4nex, firmware, 1, 1, 1);
 #ifdef DEV_NETMAP
 MODULE_DEPEND(t4nex, netmap, 1, 1, 1);
 #endif /* DEV_NETMAP */
 
 
 DRIVER_MODULE(t5nex, pci, t5_driver, t5_devclass, mod_event, 0);
 MODULE_VERSION(t5nex, 1);
 MODULE_DEPEND(t5nex, firmware, 1, 1, 1);
 #ifdef DEV_NETMAP
 MODULE_DEPEND(t5nex, netmap, 1, 1, 1);
 #endif /* DEV_NETMAP */
 
 DRIVER_MODULE(cxgbe, t4nex, cxgbe_driver, cxgbe_devclass, 0, 0);
 MODULE_VERSION(cxgbe, 1);
 
 DRIVER_MODULE(cxl, t5nex, cxl_driver, cxl_devclass, 0, 0);
 MODULE_VERSION(cxl, 1);
 
 DRIVER_MODULE(vcxgbe, cxgbe, vcxgbe_driver, vcxgbe_devclass, 0, 0);
 MODULE_VERSION(vcxgbe, 1);
 
 DRIVER_MODULE(vcxl, cxl, vcxl_driver, vcxl_devclass, 0, 0);
 MODULE_VERSION(vcxl, 1);
Index: user/alc/PQ_LAUNDRY/sys/dev/cxgbe/tom/t4_ddp.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/dev/cxgbe/tom/t4_ddp.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/dev/cxgbe/tom/t4_ddp.c	(revision 303775)
@@ -1,1794 +1,1794 @@
 /*-
  * Copyright (c) 2012 Chelsio Communications, Inc.
  * All rights reserved.
  * Written by: Navdeep Parhar <np@FreeBSD.org>
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 
 #include <sys/param.h>
 #include <sys/aio.h>
 #include <sys/file.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/ktr.h>
 #include <sys/module.h>
 #include <sys/protosw.h>
 #include <sys/proc.h>
 #include <sys/domain.h>
 #include <sys/socket.h>
 #include <sys/socketvar.h>
 #include <sys/taskqueue.h>
 #include <sys/uio.h>
 #include <netinet/in.h>
 #include <netinet/in_pcb.h>
 #include <netinet/ip.h>
 #include <netinet/tcp_var.h>
 #define TCPSTATES
 #include <netinet/tcp_fsm.h>
 #include <netinet/toecore.h>
 
 #include <vm/vm.h>
 #include <vm/vm_extern.h>
 #include <vm/vm_param.h>
 #include <vm/pmap.h>
 #include <vm/vm_map.h>
 #include <vm/vm_page.h>
 #include <vm/vm_object.h>
 
 #ifdef TCP_OFFLOAD
 #include "common/common.h"
 #include "common/t4_msg.h"
 #include "common/t4_regs.h"
 #include "common/t4_tcb.h"
 #include "tom/t4_tom.h"
 
 VNET_DECLARE(int, tcp_do_autorcvbuf);
 #define V_tcp_do_autorcvbuf VNET(tcp_do_autorcvbuf)
 VNET_DECLARE(int, tcp_autorcvbuf_inc);
 #define V_tcp_autorcvbuf_inc VNET(tcp_autorcvbuf_inc)
 VNET_DECLARE(int, tcp_autorcvbuf_max);
 #define V_tcp_autorcvbuf_max VNET(tcp_autorcvbuf_max)
 
 /*
  * Use the 'backend3' field in AIO jobs to store the amount of data
  * received by the AIO job so far.
  */
 #define	aio_received	backend3
 
 static void aio_ddp_requeue_task(void *context, int pending);
 static void ddp_complete_all(struct toepcb *toep, int error);
 static void t4_aio_cancel_active(struct kaiocb *job);
 static void t4_aio_cancel_queued(struct kaiocb *job);
 
 #define PPOD_SZ(n)	((n) * sizeof(struct pagepod))
 #define PPOD_SIZE	(PPOD_SZ(1))
 
 static TAILQ_HEAD(, pageset) ddp_orphan_pagesets;
 static struct mtx ddp_orphan_pagesets_lock;
 static struct task ddp_orphan_task;
 
 #define MAX_DDP_BUFFER_SIZE		(M_TCB_RX_DDP_BUF0_LEN)
 static int
 alloc_ppods(struct tom_data *td, int n, u_int *ppod_addr)
 {
 	vmem_addr_t v;
 	int rc;
 
 	MPASS(n > 0);
 
 	rc = vmem_alloc(td->ppod_arena, PPOD_SZ(n), M_NOWAIT | M_FIRSTFIT, &v);
 	*ppod_addr = (u_int)v;
 
 	return (rc);
 }
 
 static void
 free_ppods(struct tom_data *td, u_int ppod_addr, int n)
 {
 
 	MPASS(n > 0);
 
 	vmem_free(td->ppod_arena, (vmem_addr_t)ppod_addr, PPOD_SZ(n));
 }
 
 static inline int
 pages_to_nppods(int npages, int ddp_pgsz)
 {
 	int nsegs = npages * PAGE_SIZE / ddp_pgsz;
 
 	return (howmany(nsegs, PPOD_PAGES));
 }
 
 /*
  * A page set holds information about a buffer used for DDP.  The page
  * set holds resources such as the VM pages backing the buffer (either
  * held or wired) and the page pods associated with the buffer.
  * Recently used page sets are cached to allow for efficient reuse of
  * buffers (avoiding the need to re-fault in pages, hold them, etc.).
  * Note that cached page sets keep the backing pages wired.  The
  * number of wired pages is capped by only allowing for two wired
  * pagesets per connection.  This is not a perfect cap, but is a
  * trade-off for performance.
  *
  * If an application ping-pongs two buffers for a connection via
  * aio_read(2) then those buffers should remain wired and expensive VM
  * fault lookups should be avoided after each buffer has been used
  * once.  If an application uses more than two buffers then this will
  * fall back to doing expensive VM fault lookups for each operation.
  */
 static void
 free_pageset(struct tom_data *td, struct pageset *ps)
 {
 	vm_page_t p;
 	int i;
 
 	if (ps->nppods > 0)
 		free_ppods(td, ps->ppod_addr, ps->nppods);
 
 	if (ps->flags & PS_WIRED) {
 		for (i = 0; i < ps->npages; i++) {
 			p = ps->pages[i];
 			vm_page_lock(p);
 			vm_page_unwire(p, PQ_INACTIVE);
 			vm_page_unlock(p);
 		}
 	} else
 		vm_page_unhold_pages(ps->pages, ps->npages);
 	mtx_lock(&ddp_orphan_pagesets_lock);
 	TAILQ_INSERT_TAIL(&ddp_orphan_pagesets, ps, link);
 	taskqueue_enqueue(taskqueue_thread, &ddp_orphan_task);
 	mtx_unlock(&ddp_orphan_pagesets_lock);
 }
 
 static void
 ddp_free_orphan_pagesets(void *context, int pending)
 {
 	struct pageset *ps;
 
 	mtx_lock(&ddp_orphan_pagesets_lock);
 	while (!TAILQ_EMPTY(&ddp_orphan_pagesets)) {
 		ps = TAILQ_FIRST(&ddp_orphan_pagesets);
 		TAILQ_REMOVE(&ddp_orphan_pagesets, ps, link);
 		mtx_unlock(&ddp_orphan_pagesets_lock);
 		if (ps->vm)
 			vmspace_free(ps->vm);
 		free(ps, M_CXGBE);
 		mtx_lock(&ddp_orphan_pagesets_lock);
 	}
 	mtx_unlock(&ddp_orphan_pagesets_lock);
 }
 
 static void
 recycle_pageset(struct toepcb *toep, struct pageset *ps)
 {
 
 	DDP_ASSERT_LOCKED(toep);
 	if (!(toep->ddp_flags & DDP_DEAD) && ps->flags & PS_WIRED) {
 		KASSERT(toep->ddp_cached_count + toep->ddp_active_count <
 		    nitems(toep->db), ("too many wired pagesets"));
 		TAILQ_INSERT_HEAD(&toep->ddp_cached_pagesets, ps, link);
 		toep->ddp_cached_count++;
 	} else
 		free_pageset(toep->td, ps);
 }
 
 static void
 ddp_complete_one(struct kaiocb *job, int error)
 {
 	long copied;
 
 	/*
 	 * If this job had copied data out of the socket buffer before
 	 * it was cancelled, report it as a short read rather than an
 	 * error.
 	 */
 	copied = job->aio_received;
 	if (copied != 0 || error == 0)
 		aio_complete(job, copied, 0);
 	else
 		aio_complete(job, -1, error);
 }
 
 static void
 free_ddp_buffer(struct tom_data *td, struct ddp_buffer *db)
 {
 
 	if (db->job) {
 		/*
 		 * XXX: If we are un-offloading the socket then we
 		 * should requeue these on the socket somehow.  If we
 		 * got a FIN from the remote end, then this completes
 		 * any remaining requests with an EOF read.
 		 */
 		if (!aio_clear_cancel_function(db->job))
 			ddp_complete_one(db->job, 0);
 	}
 
 	if (db->ps)
 		free_pageset(td, db->ps);
 }
 
 void
 ddp_init_toep(struct toepcb *toep)
 {
 
 	TAILQ_INIT(&toep->ddp_aiojobq);
 	TASK_INIT(&toep->ddp_requeue_task, 0, aio_ddp_requeue_task, toep);
 	toep->ddp_active_id = -1;
 	mtx_init(&toep->ddp_lock, "t4 ddp", NULL, MTX_DEF);
 }
 
 void
 ddp_uninit_toep(struct toepcb *toep)
 {
 
 	mtx_destroy(&toep->ddp_lock);
 }
 
 void
 release_ddp_resources(struct toepcb *toep)
 {
 	struct pageset *ps;
 	int i;
 
 	DDP_LOCK(toep);
 	toep->flags |= DDP_DEAD;
 	for (i = 0; i < nitems(toep->db); i++) {
 		free_ddp_buffer(toep->td, &toep->db[i]);
 	}
 	while ((ps = TAILQ_FIRST(&toep->ddp_cached_pagesets)) != NULL) {
 		TAILQ_REMOVE(&toep->ddp_cached_pagesets, ps, link);
 		free_pageset(toep->td, ps);
 	}
 	ddp_complete_all(toep, 0);
 	DDP_UNLOCK(toep);
 }
 
 #ifdef INVARIANTS
 void
 ddp_assert_empty(struct toepcb *toep)
 {
 	int i;
 
 	MPASS(!(toep->ddp_flags & DDP_TASK_ACTIVE));
 	for (i = 0; i < nitems(toep->db); i++) {
 		MPASS(toep->db[i].job == NULL);
 		MPASS(toep->db[i].ps == NULL);
 	}
 	MPASS(TAILQ_EMPTY(&toep->ddp_cached_pagesets));
 	MPASS(TAILQ_EMPTY(&toep->ddp_aiojobq));
 }
 #endif
 
 static void
 complete_ddp_buffer(struct toepcb *toep, struct ddp_buffer *db,
     unsigned int db_idx)
 {
 	unsigned int db_flag;
 
 	toep->ddp_active_count--;
 	if (toep->ddp_active_id == db_idx) {
 		if (toep->ddp_active_count == 0) {
 			KASSERT(toep->db[db_idx ^ 1].job == NULL,
 			    ("%s: active_count mismatch", __func__));
 			toep->ddp_active_id = -1;
 		} else
 			toep->ddp_active_id ^= 1;
 #ifdef VERBOSE_TRACES
 		CTR2(KTR_CXGBE, "%s: ddp_active_id = %d", __func__,
 		    toep->ddp_active_id);
 #endif
 	} else {
 		KASSERT(toep->ddp_active_count != 0 &&
 		    toep->ddp_active_id != -1,
 		    ("%s: active count mismatch", __func__));
 	}
 
 	db->cancel_pending = 0;
 	db->job = NULL;
 	recycle_pageset(toep, db->ps);
 	db->ps = NULL;
 
 	db_flag = db_idx == 1 ? DDP_BUF1_ACTIVE : DDP_BUF0_ACTIVE;
 	KASSERT(toep->ddp_flags & db_flag,
 	    ("%s: DDP buffer not active. toep %p, ddp_flags 0x%x",
 	    __func__, toep, toep->ddp_flags));
 	toep->ddp_flags &= ~db_flag;
 }
 
 /* XXX: handle_ddp_data code duplication */
 void
 insert_ddp_data(struct toepcb *toep, uint32_t n)
 {
 	struct inpcb *inp = toep->inp;
 	struct tcpcb *tp = intotcpcb(inp);
 	struct ddp_buffer *db;
 	struct kaiocb *job;
 	size_t placed;
 	long copied;
 	unsigned int db_flag, db_idx;
 
 	INP_WLOCK_ASSERT(inp);
 	DDP_ASSERT_LOCKED(toep);
 
 	tp->rcv_nxt += n;
 #ifndef USE_DDP_RX_FLOW_CONTROL
 	KASSERT(tp->rcv_wnd >= n, ("%s: negative window size", __func__));
 	tp->rcv_wnd -= n;
 #endif
 #ifndef USE_DDP_RX_FLOW_CONTROL
 	toep->rx_credits += n;
 #endif
 	CTR2(KTR_CXGBE, "%s: placed %u bytes before falling out of DDP",
 	    __func__, n);
 	while (toep->ddp_active_count > 0) {
 		MPASS(toep->ddp_active_id != -1);
 		db_idx = toep->ddp_active_id;
 		db_flag = db_idx == 1 ? DDP_BUF1_ACTIVE : DDP_BUF0_ACTIVE;
 		MPASS((toep->ddp_flags & db_flag) != 0);
 		db = &toep->db[db_idx];
 		job = db->job;
 		copied = job->aio_received;
 		placed = n;
 		if (placed > job->uaiocb.aio_nbytes - copied)
 			placed = job->uaiocb.aio_nbytes - copied;
 		if (placed > 0)
 			job->msgrcv = 1;
 		if (!aio_clear_cancel_function(job)) {
 			/*
 			 * Update the copied length for when
 			 * t4_aio_cancel_active() completes this
 			 * request.
 			 */
 			job->aio_received += placed;
 		} else if (copied + placed != 0) {
 			CTR4(KTR_CXGBE,
 			    "%s: completing %p (copied %ld, placed %lu)",
 			    __func__, job, copied, placed);
 			/* XXX: This always completes if there is some data. */
 			aio_complete(job, copied + placed, 0);
 		} else if (aio_set_cancel_function(job, t4_aio_cancel_queued)) {
 			TAILQ_INSERT_HEAD(&toep->ddp_aiojobq, job, list);
 			toep->ddp_waiting_count++;
 		} else
 			aio_cancel(job);
 		n -= placed;
 		complete_ddp_buffer(toep, db, db_idx);
 	}
 
 	MPASS(n == 0);
 }
 
 /* SET_TCB_FIELD sent as a ULP command looks like this */
 #define LEN__SET_TCB_FIELD_ULP (sizeof(struct ulp_txpkt) + \
     sizeof(struct ulptx_idata) + sizeof(struct cpl_set_tcb_field_core))
 
 /* RX_DATA_ACK sent as a ULP command looks like this */
 #define LEN__RX_DATA_ACK_ULP (sizeof(struct ulp_txpkt) + \
     sizeof(struct ulptx_idata) + sizeof(struct cpl_rx_data_ack_core))
 
 static inline void *
 mk_set_tcb_field_ulp(struct ulp_txpkt *ulpmc, struct toepcb *toep,
     uint64_t word, uint64_t mask, uint64_t val)
 {
 	struct ulptx_idata *ulpsc;
 	struct cpl_set_tcb_field_core *req;
 
 	ulpmc->cmd_dest = htonl(V_ULPTX_CMD(ULP_TX_PKT) | V_ULP_TXPKT_DEST(0));
 	ulpmc->len = htobe32(howmany(LEN__SET_TCB_FIELD_ULP, 16));
 
 	ulpsc = (struct ulptx_idata *)(ulpmc + 1);
 	ulpsc->cmd_more = htobe32(V_ULPTX_CMD(ULP_TX_SC_IMM));
 	ulpsc->len = htobe32(sizeof(*req));
 
 	req = (struct cpl_set_tcb_field_core *)(ulpsc + 1);
 	OPCODE_TID(req) = htobe32(MK_OPCODE_TID(CPL_SET_TCB_FIELD, toep->tid));
 	req->reply_ctrl = htobe16(V_NO_REPLY(1) |
 	    V_QUEUENO(toep->ofld_rxq->iq.abs_id));
 	req->word_cookie = htobe16(V_WORD(word) | V_COOKIE(0));
         req->mask = htobe64(mask);
         req->val = htobe64(val);
 
 	ulpsc = (struct ulptx_idata *)(req + 1);
 	if (LEN__SET_TCB_FIELD_ULP % 16) {
 		ulpsc->cmd_more = htobe32(V_ULPTX_CMD(ULP_TX_SC_NOOP));
 		ulpsc->len = htobe32(0);
 		return (ulpsc + 1);
 	}
 	return (ulpsc);
 }
 
 static inline void *
 mk_rx_data_ack_ulp(struct ulp_txpkt *ulpmc, struct toepcb *toep)
 {
 	struct ulptx_idata *ulpsc;
 	struct cpl_rx_data_ack_core *req;
 
 	ulpmc->cmd_dest = htonl(V_ULPTX_CMD(ULP_TX_PKT) | V_ULP_TXPKT_DEST(0));
 	ulpmc->len = htobe32(howmany(LEN__RX_DATA_ACK_ULP, 16));
 
 	ulpsc = (struct ulptx_idata *)(ulpmc + 1);
 	ulpsc->cmd_more = htobe32(V_ULPTX_CMD(ULP_TX_SC_IMM));
 	ulpsc->len = htobe32(sizeof(*req));
 
 	req = (struct cpl_rx_data_ack_core *)(ulpsc + 1);
 	OPCODE_TID(req) = htobe32(MK_OPCODE_TID(CPL_RX_DATA_ACK, toep->tid));
 	req->credit_dack = htobe32(F_RX_MODULATE_RX);
 
 	ulpsc = (struct ulptx_idata *)(req + 1);
 	if (LEN__RX_DATA_ACK_ULP % 16) {
 		ulpsc->cmd_more = htobe32(V_ULPTX_CMD(ULP_TX_SC_NOOP));
 		ulpsc->len = htobe32(0);
 		return (ulpsc + 1);
 	}
 	return (ulpsc);
 }
 
 static struct wrqe *
 mk_update_tcb_for_ddp(struct adapter *sc, struct toepcb *toep, int db_idx,
     struct pageset *ps, int offset, uint64_t ddp_flags, uint64_t ddp_flags_mask)
 {
 	struct wrqe *wr;
 	struct work_request_hdr *wrh;
 	struct ulp_txpkt *ulpmc;
 	int len;
 
 	KASSERT(db_idx == 0 || db_idx == 1,
 	    ("%s: bad DDP buffer index %d", __func__, db_idx));
 
 	/*
 	 * We'll send a compound work request that has 3 SET_TCB_FIELDs and an
 	 * RX_DATA_ACK (with RX_MODULATE to speed up delivery).
 	 *
 	 * The work request header is 16B and always ends at a 16B boundary.
 	 * The ULPTX master commands that follow must all end at 16B boundaries
 	 * too so we round up the size to 16.
 	 */
 	len = sizeof(*wrh) + 3 * roundup2(LEN__SET_TCB_FIELD_ULP, 16) +
 	    roundup2(LEN__RX_DATA_ACK_ULP, 16);
 
 	wr = alloc_wrqe(len, toep->ctrlq);
 	if (wr == NULL)
 		return (NULL);
 	wrh = wrtod(wr);
 	INIT_ULPTX_WRH(wrh, len, 1, 0);	/* atomic */
 	ulpmc = (struct ulp_txpkt *)(wrh + 1);
 
 	/* Write the buffer's tag */
 	ulpmc = mk_set_tcb_field_ulp(ulpmc, toep,
 	    W_TCB_RX_DDP_BUF0_TAG + db_idx,
 	    V_TCB_RX_DDP_BUF0_TAG(M_TCB_RX_DDP_BUF0_TAG),
 	    V_TCB_RX_DDP_BUF0_TAG(ps->tag));
 
 	/* Update the current offset in the DDP buffer and its total length */
 	if (db_idx == 0)
 		ulpmc = mk_set_tcb_field_ulp(ulpmc, toep,
 		    W_TCB_RX_DDP_BUF0_OFFSET,
 		    V_TCB_RX_DDP_BUF0_OFFSET(M_TCB_RX_DDP_BUF0_OFFSET) |
 		    V_TCB_RX_DDP_BUF0_LEN(M_TCB_RX_DDP_BUF0_LEN),
 		    V_TCB_RX_DDP_BUF0_OFFSET(offset) |
 		    V_TCB_RX_DDP_BUF0_LEN(ps->len));
 	else
 		ulpmc = mk_set_tcb_field_ulp(ulpmc, toep,
 		    W_TCB_RX_DDP_BUF1_OFFSET,
 		    V_TCB_RX_DDP_BUF1_OFFSET(M_TCB_RX_DDP_BUF1_OFFSET) |
 		    V_TCB_RX_DDP_BUF1_LEN((u64)M_TCB_RX_DDP_BUF1_LEN << 32),
 		    V_TCB_RX_DDP_BUF1_OFFSET(offset) |
 		    V_TCB_RX_DDP_BUF1_LEN((u64)ps->len << 32));
 
 	/* Update DDP flags */
 	ulpmc = mk_set_tcb_field_ulp(ulpmc, toep, W_TCB_RX_DDP_FLAGS,
 	    ddp_flags_mask, ddp_flags);
 
 	/* Gratuitous RX_DATA_ACK with RX_MODULATE set to speed up delivery. */
 	ulpmc = mk_rx_data_ack_ulp(ulpmc, toep);
 
 	return (wr);
 }
 
 static int
 handle_ddp_data(struct toepcb *toep, __be32 ddp_report, __be32 rcv_nxt, int len)
 {
 	uint32_t report = be32toh(ddp_report);
 	unsigned int db_idx;
 	struct inpcb *inp = toep->inp;
 	struct ddp_buffer *db;
 	struct tcpcb *tp;
 	struct socket *so;
 	struct sockbuf *sb;
 	struct kaiocb *job;
 	long copied;
 
 	db_idx = report & F_DDP_BUF_IDX ? 1 : 0;
 
 	if (__predict_false(!(report & F_DDP_INV)))
 		CXGBE_UNIMPLEMENTED("DDP buffer still valid");
 
 	INP_WLOCK(inp);
 	so = inp_inpcbtosocket(inp);
 	sb = &so->so_rcv;
 	DDP_LOCK(toep);
 
 	KASSERT(toep->ddp_active_id == db_idx,
 	    ("completed DDP buffer (%d) != active_id (%d) for tid %d", db_idx,
 	    toep->ddp_active_id, toep->tid));
 	db = &toep->db[db_idx];
 	job = db->job;
 
 	if (__predict_false(inp->inp_flags & (INP_DROPPED | INP_TIMEWAIT))) {
 		/*
 		 * This can happen due to an administrative tcpdrop(8).
 		 * Just fail the request with ECONNRESET.
 		 */
 		CTR5(KTR_CXGBE, "%s: tid %u, seq 0x%x, len %d, inp_flags 0x%x",
 		    __func__, toep->tid, be32toh(rcv_nxt), len, inp->inp_flags);
 		if (aio_clear_cancel_function(job))
 			ddp_complete_one(job, ECONNRESET);
 		goto completed;
 	}
 
 	tp = intotcpcb(inp);
 
 	/*
 	 * For RX_DDP_COMPLETE, len will be zero and rcv_nxt is the
 	 * sequence number of the next byte to receive.  The length of
 	 * the data received for this message must be computed by
 	 * comparing the new and old values of rcv_nxt.
 	 *
 	 * For RX_DATA_DDP, len might be non-zero, but it is only the
 	 * length of the most recent DMA.  It does not include the
 	 * total length of the data received since the previous update
 	 * for this DDP buffer.  rcv_nxt is the sequence number of the
 	 * first received byte from the most recent DMA.
 	 */
 	len += be32toh(rcv_nxt) - tp->rcv_nxt;
 	tp->rcv_nxt += len;
 	tp->t_rcvtime = ticks;
 #ifndef USE_DDP_RX_FLOW_CONTROL
 	KASSERT(tp->rcv_wnd >= len, ("%s: negative window size", __func__));
 	tp->rcv_wnd -= len;
 #endif
 #ifdef VERBOSE_TRACES
 	CTR4(KTR_CXGBE, "%s: DDP[%d] placed %d bytes (%#x)", __func__, db_idx,
 	    len, report);
 #endif
 
 	/* receive buffer autosize */
 	CURVNET_SET(so->so_vnet);
 	SOCKBUF_LOCK(sb);
 	if (sb->sb_flags & SB_AUTOSIZE &&
 	    V_tcp_do_autorcvbuf &&
 	    sb->sb_hiwat < V_tcp_autorcvbuf_max &&
 	    len > (sbspace(sb) / 8 * 7)) {
 		unsigned int hiwat = sb->sb_hiwat;
 		unsigned int newsize = min(hiwat + V_tcp_autorcvbuf_inc,
 		    V_tcp_autorcvbuf_max);
 
 		if (!sbreserve_locked(sb, newsize, so, NULL))
 			sb->sb_flags &= ~SB_AUTOSIZE;
 		else
 			toep->rx_credits += newsize - hiwat;
 	}
 	SOCKBUF_UNLOCK(sb);
 	CURVNET_RESTORE();
 
 #ifndef USE_DDP_RX_FLOW_CONTROL
 	toep->rx_credits += len;
 #endif
 
 	job->msgrcv = 1;
 	if (db->cancel_pending) {
 		/*
 		 * Update the job's length but defer completion to the
 		 * TCB_RPL callback.
 		 */
 		job->aio_received += len;
 		goto out;
 	} else if (!aio_clear_cancel_function(job)) {
 		/*
 		 * Update the copied length for when
 		 * t4_aio_cancel_active() completes this request.
 		 */
 		job->aio_received += len;
 	} else {
 		copied = job->aio_received;
 #ifdef VERBOSE_TRACES
 		CTR4(KTR_CXGBE, "%s: completing %p (copied %ld, placed %d)",
 		    __func__, job, copied, len);
 #endif
 		aio_complete(job, copied + len, 0);
 		t4_rcvd(&toep->td->tod, tp);
 	}
 
 completed:
 	complete_ddp_buffer(toep, db, db_idx);
 	if (toep->ddp_waiting_count > 0)
 		ddp_queue_toep(toep);
 out:
 	DDP_UNLOCK(toep);
 	INP_WUNLOCK(inp);
 
 	return (0);
 }
 
 void
 handle_ddp_indicate(struct toepcb *toep)
 {
 
 	DDP_ASSERT_LOCKED(toep);
 	MPASS(toep->ddp_active_count == 0);
 	MPASS((toep->ddp_flags & (DDP_BUF0_ACTIVE | DDP_BUF1_ACTIVE)) == 0);
 	if (toep->ddp_waiting_count == 0) {
 		/*
 		 * The pending requests that triggered the request for an
 		 * an indicate were cancelled.  Those cancels should have
 		 * already disabled DDP.  Just ignore this as the data is
 		 * going into the socket buffer anyway.
 		 */
 		return;
 	}
 	CTR3(KTR_CXGBE, "%s: tid %d indicated (%d waiting)", __func__,
 	    toep->tid, toep->ddp_waiting_count);
 	ddp_queue_toep(toep);
 }
 
 enum {
 	DDP_BUF0_INVALIDATED = 0x2,
 	DDP_BUF1_INVALIDATED
 };
 
 void
 handle_ddp_tcb_rpl(struct toepcb *toep, const struct cpl_set_tcb_rpl *cpl)
 {
 	unsigned int db_idx;
 	struct inpcb *inp = toep->inp;
 	struct ddp_buffer *db;
 	struct kaiocb *job;
 	long copied;
 
 	if (cpl->status != CPL_ERR_NONE)
 		panic("XXX: tcp_rpl failed: %d", cpl->status);
 
 	switch (cpl->cookie) {
 	case V_WORD(W_TCB_RX_DDP_FLAGS) | V_COOKIE(DDP_BUF0_INVALIDATED):
 	case V_WORD(W_TCB_RX_DDP_FLAGS) | V_COOKIE(DDP_BUF1_INVALIDATED):
 		/*
 		 * XXX: This duplicates a lot of code with handle_ddp_data().
 		 */
 		db_idx = G_COOKIE(cpl->cookie) - DDP_BUF0_INVALIDATED;
 		INP_WLOCK(inp);
 		DDP_LOCK(toep);
 		db = &toep->db[db_idx];
 
 		/*
 		 * handle_ddp_data() should leave the job around until
 		 * this callback runs once a cancel is pending.
 		 */
 		MPASS(db != NULL);
 		MPASS(db->job != NULL);
 		MPASS(db->cancel_pending);
 
 		/*
 		 * XXX: It's not clear what happens if there is data
 		 * placed when the buffer is invalidated.  I suspect we
 		 * need to read the TCB to see how much data was placed.
 		 *
 		 * For now this just pretends like nothing was placed.
 		 *
 		 * XXX: Note that if we did check the PCB we would need to
 		 * also take care of updating the tp, etc.
 		 */
 		job = db->job;
 		copied = job->aio_received;
 		if (copied == 0) {
 			CTR2(KTR_CXGBE, "%s: cancelling %p", __func__, job);
 			aio_cancel(job);
 		} else {
 			CTR3(KTR_CXGBE, "%s: completing %p (copied %ld)",
 			    __func__, job, copied);
 			aio_complete(job, copied, 0);
 			t4_rcvd(&toep->td->tod, intotcpcb(inp));
 		}
 
 		complete_ddp_buffer(toep, db, db_idx);
 		if (toep->ddp_waiting_count > 0)
 			ddp_queue_toep(toep);
 		DDP_UNLOCK(toep);
 		INP_WUNLOCK(inp);
 		break;
 	default:
 		panic("XXX: unknown tcb_rpl offset %#x, cookie %#x",
 		    G_WORD(cpl->cookie), G_COOKIE(cpl->cookie));
 	}
 }
 
 void
 handle_ddp_close(struct toepcb *toep, struct tcpcb *tp, __be32 rcv_nxt)
 {
 	struct ddp_buffer *db;
 	struct kaiocb *job;
 	long copied;
 	unsigned int db_flag, db_idx;
 	int len, placed;
 
 	INP_WLOCK_ASSERT(toep->inp);
 	DDP_ASSERT_LOCKED(toep);
 	len = be32toh(rcv_nxt) - tp->rcv_nxt;
 
 	tp->rcv_nxt += len;
 #ifndef USE_DDP_RX_FLOW_CONTROL
 	toep->rx_credits += len;
 #endif
 
 	while (toep->ddp_active_count > 0) {
 		MPASS(toep->ddp_active_id != -1);
 		db_idx = toep->ddp_active_id;
 		db_flag = db_idx == 1 ? DDP_BUF1_ACTIVE : DDP_BUF0_ACTIVE;
 		MPASS((toep->ddp_flags & db_flag) != 0);
 		db = &toep->db[db_idx];
 		job = db->job;
 		copied = job->aio_received;
 		placed = len;
 		if (placed > job->uaiocb.aio_nbytes - copied)
 			placed = job->uaiocb.aio_nbytes - copied;
 		if (placed > 0)
 			job->msgrcv = 1;
 		if (!aio_clear_cancel_function(job)) {
 			/*
 			 * Update the copied length for when
 			 * t4_aio_cancel_active() completes this
 			 * request.
 			 */
 			job->aio_received += placed;
 		} else {
 			CTR4(KTR_CXGBE, "%s: tid %d completed buf %d len %d",
 			    __func__, toep->tid, db_idx, placed);
 			aio_complete(job, copied + placed, 0);
 		}
 		len -= placed;
 		complete_ddp_buffer(toep, db, db_idx);
 	}
 
 	MPASS(len == 0);
 	ddp_complete_all(toep, 0);
 }
 
 #define DDP_ERR (F_DDP_PPOD_MISMATCH | F_DDP_LLIMIT_ERR | F_DDP_ULIMIT_ERR |\
 	 F_DDP_PPOD_PARITY_ERR | F_DDP_PADDING_ERR | F_DDP_OFFSET_ERR |\
 	 F_DDP_INVALID_TAG | F_DDP_COLOR_ERR | F_DDP_TID_MISMATCH |\
 	 F_DDP_INVALID_PPOD | F_DDP_HDRCRC_ERR | F_DDP_DATACRC_ERR)
 
 extern cpl_handler_t t4_cpl_handler[];
 
 static int
 do_rx_data_ddp(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m)
 {
 	struct adapter *sc = iq->adapter;
 	const struct cpl_rx_data_ddp *cpl = (const void *)(rss + 1);
 	unsigned int tid = GET_TID(cpl);
 	uint32_t vld;
 	struct toepcb *toep = lookup_tid(sc, tid);
 
 	KASSERT(m == NULL, ("%s: wasn't expecting payload", __func__));
 	KASSERT(toep->tid == tid, ("%s: toep tid/atid mismatch", __func__));
 	KASSERT(!(toep->flags & TPF_SYNQE),
 	    ("%s: toep %p claims to be a synq entry", __func__, toep));
 
 	vld = be32toh(cpl->ddpvld);
 	if (__predict_false(vld & DDP_ERR)) {
 		panic("%s: DDP error 0x%x (tid %d, toep %p)",
 		    __func__, vld, tid, toep);
 	}
 
 	if (toep->ulp_mode == ULP_MODE_ISCSI) {
 		t4_cpl_handler[CPL_RX_ISCSI_DDP](iq, rss, m);
 		return (0);
 	}
 
 	handle_ddp_data(toep, cpl->u.ddp_report, cpl->seq, be16toh(cpl->len));
 
 	return (0);
 }
 
 static int
 do_rx_ddp_complete(struct sge_iq *iq, const struct rss_header *rss,
     struct mbuf *m)
 {
 	struct adapter *sc = iq->adapter;
 	const struct cpl_rx_ddp_complete *cpl = (const void *)(rss + 1);
 	unsigned int tid = GET_TID(cpl);
 	struct toepcb *toep = lookup_tid(sc, tid);
 
 	KASSERT(m == NULL, ("%s: wasn't expecting payload", __func__));
 	KASSERT(toep->tid == tid, ("%s: toep tid/atid mismatch", __func__));
 	KASSERT(!(toep->flags & TPF_SYNQE),
 	    ("%s: toep %p claims to be a synq entry", __func__, toep));
 
 	handle_ddp_data(toep, cpl->ddp_report, cpl->rcv_nxt, 0);
 
 	return (0);
 }
 
 static void
 enable_ddp(struct adapter *sc, struct toepcb *toep)
 {
 
 	KASSERT((toep->ddp_flags & (DDP_ON | DDP_OK | DDP_SC_REQ)) == DDP_OK,
 	    ("%s: toep %p has bad ddp_flags 0x%x",
 	    __func__, toep, toep->ddp_flags));
 
 	CTR3(KTR_CXGBE, "%s: tid %u (time %u)",
 	    __func__, toep->tid, time_uptime);
 
 	DDP_ASSERT_LOCKED(toep);
 	toep->ddp_flags |= DDP_SC_REQ;
 	t4_set_tcb_field(sc, toep->ctrlq, toep->tid, W_TCB_RX_DDP_FLAGS,
 	    V_TF_DDP_OFF(1) | V_TF_DDP_INDICATE_OUT(1) |
 	    V_TF_DDP_BUF0_INDICATE(1) | V_TF_DDP_BUF1_INDICATE(1) |
 	    V_TF_DDP_BUF0_VALID(1) | V_TF_DDP_BUF1_VALID(1),
 	    V_TF_DDP_BUF0_INDICATE(1) | V_TF_DDP_BUF1_INDICATE(1), 0, 0,
 	    toep->ofld_rxq->iq.abs_id);
 	t4_set_tcb_field(sc, toep->ctrlq, toep->tid, W_TCB_T_FLAGS,
 	    V_TF_RCV_COALESCE_ENABLE(1), 0, 0, 0, toep->ofld_rxq->iq.abs_id);
 }
 
 static int
 calculate_hcf(int n1, int n2)
 {
 	int a, b, t;
 
 	if (n1 <= n2) {
 		a = n1;
 		b = n2;
 	} else {
 		a = n2;
 		b = n1;
 	}
 
 	while (a != 0) {
 		t = a;
 		a = b % a;
 		b = t;
 	}
 
 	return (b);
 }
 
 static int
 alloc_page_pods(struct tom_data *td, struct pageset *ps)
 {
 	int i, hcf, seglen, idx, ppod, nppods;
 	u_int ppod_addr;
 
 	KASSERT(ps->nppods == 0, ("%s: page pods already allocated", __func__));
 
 	/*
 	 * The DDP page size is unrelated to the VM page size.  We combine
 	 * contiguous physical pages into larger segments to get the best DDP
 	 * page size possible.  This is the largest of the four sizes in
 	 * A_ULP_RX_TDDP_PSZ that evenly divides the HCF of the segment sizes in
 	 * the page list.
 	 */
 	hcf = 0;
 	for (i = 0; i < ps->npages; i++) {
 		seglen = PAGE_SIZE;
 		while (i < ps->npages - 1 &&
 		    ps->pages[i]->phys_addr + PAGE_SIZE ==
 		    ps->pages[i + 1]->phys_addr) {
 			seglen += PAGE_SIZE;
 			i++;
 		}
 
 		hcf = calculate_hcf(hcf, seglen);
 		if (hcf < td->ddp_pgsz[1]) {
 			idx = 0;
 			goto have_pgsz;	/* give up, short circuit */
 		}
 	}
 
 	if (hcf % td->ddp_pgsz[0] != 0) {
 		/* hmmm.  This could only happen when PAGE_SIZE < 4K */
 		KASSERT(PAGE_SIZE < 4096,
 		    ("%s: PAGE_SIZE %d, hcf %d", __func__, PAGE_SIZE, hcf));
 		CTR3(KTR_CXGBE, "%s: PAGE_SIZE %d, hcf %d",
 		    __func__, PAGE_SIZE, hcf);
 		return (0);
 	}
 
 	for (idx = nitems(td->ddp_pgsz) - 1; idx > 0; idx--) {
 		if (hcf % td->ddp_pgsz[idx] == 0)
 			break;
 	}
 have_pgsz:
 	MPASS(idx <= M_PPOD_PGSZ);
 
 	nppods = pages_to_nppods(ps->npages, td->ddp_pgsz[idx]);
 	if (alloc_ppods(td, nppods, &ppod_addr) != 0) {
 		CTR4(KTR_CXGBE, "%s: no pods, nppods %d, npages %d, pgsz %d",
 		    __func__, nppods, ps->npages, td->ddp_pgsz[idx]);
 		return (0);
 	}
 
 	ppod = (ppod_addr - td->ppod_start) / PPOD_SIZE;
 	ps->tag = V_PPOD_PGSZ(idx) | V_PPOD_TAG(ppod);
 	ps->ppod_addr = ppod_addr;
 	ps->nppods = nppods;
 
 	CTR5(KTR_CXGBE, "New page pods.  "
 	    "ps %p, ddp_pgsz %d, ppod 0x%x, npages %d, nppods %d",
 	    ps, td->ddp_pgsz[idx], ppod, ps->npages, ps->nppods);
 
 	return (1);
 }
 
 #define NUM_ULP_TX_SC_IMM_PPODS (256 / PPOD_SIZE)
 
 static int
 write_page_pods(struct adapter *sc, struct toepcb *toep, struct pageset *ps)
 {
 	struct wrqe *wr;
 	struct ulp_mem_io *ulpmc;
 	struct ulptx_idata *ulpsc;
 	struct pagepod *ppod;
 	struct tom_data *td = sc->tom_softc;
 	int i, j, k, n, chunk, len, ddp_pgsz, idx;
 	u_int ppod_addr;
 	uint32_t cmd;
 
 	KASSERT(!(ps->flags & PS_PPODS_WRITTEN),
 	    ("%s: page pods already written", __func__));
 
 	cmd = htobe32(V_ULPTX_CMD(ULP_TX_MEM_WRITE));
 	if (is_t4(sc))
 		cmd |= htobe32(F_ULP_MEMIO_ORDER);
 	else
 		cmd |= htobe32(F_T5_ULP_MEMIO_IMM);
 	ddp_pgsz = td->ddp_pgsz[G_PPOD_PGSZ(ps->tag)];
 	ppod_addr = ps->ppod_addr;
 	for (i = 0; i < ps->nppods; ppod_addr += chunk) {
 
 		/* How many page pods are we writing in this cycle */
 		n = min(ps->nppods - i, NUM_ULP_TX_SC_IMM_PPODS);
 		chunk = PPOD_SZ(n);
 		len = roundup2(sizeof(*ulpmc) + sizeof(*ulpsc) + chunk, 16);
 
 		wr = alloc_wrqe(len, toep->ctrlq);
 		if (wr == NULL)
 			return (ENOMEM);	/* ok to just bail out */
 		ulpmc = wrtod(wr);
 
 		INIT_ULPTX_WR(ulpmc, len, 0, 0);
 		ulpmc->cmd = cmd;
 		ulpmc->dlen = htobe32(V_ULP_MEMIO_DATA_LEN(chunk / 32));
 		ulpmc->len16 = htobe32(howmany(len - sizeof(ulpmc->wr), 16));
 		ulpmc->lock_addr = htobe32(V_ULP_MEMIO_ADDR(ppod_addr >> 5));
 
 		ulpsc = (struct ulptx_idata *)(ulpmc + 1);
 		ulpsc->cmd_more = htobe32(V_ULPTX_CMD(ULP_TX_SC_IMM));
 		ulpsc->len = htobe32(chunk);
 
 		ppod = (struct pagepod *)(ulpsc + 1);
 		for (j = 0; j < n; i++, j++, ppod++) {
 			ppod->vld_tid_pgsz_tag_color = htobe64(F_PPOD_VALID |
 			    V_PPOD_TID(toep->tid) | ps->tag);
 			ppod->len_offset = htobe64(V_PPOD_LEN(ps->len) |
 			    V_PPOD_OFST(ps->offset));
 			ppod->rsvd = 0;
 			idx = i * PPOD_PAGES * (ddp_pgsz / PAGE_SIZE);
 			for (k = 0; k < nitems(ppod->addr); k++) {
 				if (idx < ps->npages) {
 					ppod->addr[k] =
 					    htobe64(ps->pages[idx]->phys_addr);
 					idx += ddp_pgsz / PAGE_SIZE;
 				} else
 					ppod->addr[k] = 0;
 #if 0
 				CTR5(KTR_CXGBE,
 				    "%s: tid %d ppod[%d]->addr[%d] = %p",
 				    __func__, toep->tid, i, k,
 				    htobe64(ppod->addr[k]));
 #endif
 			}
 
 		}
 
 		t4_wrq_tx(sc, wr);
 	}
 	ps->flags |= PS_PPODS_WRITTEN;
 
 	return (0);
 }
 
 static void
 wire_pageset(struct pageset *ps)
 {
 	vm_page_t p;
 	int i;
 
 	KASSERT(!(ps->flags & PS_WIRED), ("pageset already wired"));
 
 	for (i = 0; i < ps->npages; i++) {
 		p = ps->pages[i];
 		vm_page_lock(p);
 		vm_page_wire(p);
 		vm_page_unhold(p);
 		vm_page_unlock(p);
 	}
 	ps->flags |= PS_WIRED;
 }
 
 /*
  * Prepare a pageset for DDP.  This wires the pageset and sets up page
  * pods.
  */
 static int
 prep_pageset(struct adapter *sc, struct toepcb *toep, struct pageset *ps)
 {
 	struct tom_data *td = sc->tom_softc;
 
 	if (!(ps->flags & PS_WIRED))
 		wire_pageset(ps);
 	if (ps->nppods == 0 && !alloc_page_pods(td, ps)) {
 		return (0);
 	}
 	if (!(ps->flags & PS_PPODS_WRITTEN) &&
 	    write_page_pods(sc, toep, ps) != 0) {
 		return (0);
 	}
 
 	return (1);
 }
 
 void
 t4_init_ddp(struct adapter *sc, struct tom_data *td)
 {
 	int i;
 	uint32_t r;
 
 	r = t4_read_reg(sc, A_ULP_RX_TDDP_PSZ);
 	td->ddp_pgsz[0] = 4096 << G_HPZ0(r);
 	td->ddp_pgsz[1] = 4096 << G_HPZ1(r);
 	td->ddp_pgsz[2] = 4096 << G_HPZ2(r);
 	td->ddp_pgsz[3] = 4096 << G_HPZ3(r);
 
 	/*
 	 * The SGL -> page pod algorithm requires the sizes to be in increasing
 	 * order.
 	 */
 	for (i = 1; i < nitems(td->ddp_pgsz); i++) {
 		if (td->ddp_pgsz[i] <= td->ddp_pgsz[i - 1])
 			return;
 	}
 
 	td->ppod_start = sc->vres.ddp.start;
 	td->ppod_arena = vmem_create("DDP page pods", sc->vres.ddp.start,
-	    sc->vres.ddp.size, 1, 32, M_FIRSTFIT | M_NOWAIT);
+	    sc->vres.ddp.size, PPOD_SIZE, 512, M_FIRSTFIT | M_NOWAIT);
 }
 
 void
 t4_uninit_ddp(struct adapter *sc __unused, struct tom_data *td)
 {
 
 	if (td->ppod_arena != NULL) {
 		vmem_destroy(td->ppod_arena);
 		td->ppod_arena = NULL;
 	}
 }
 
 static int
 pscmp(struct pageset *ps, struct vmspace *vm, vm_offset_t start, int npages,
     int pgoff, int len)
 {
 
 	if (ps->npages != npages || ps->offset != pgoff || ps->len != len)
 		return (1);
 
 	return (ps->vm != vm || ps->vm_timestamp != vm->vm_map.timestamp);
 }
 
 static int
 hold_aio(struct toepcb *toep, struct kaiocb *job, struct pageset **pps)
 {
 	struct vmspace *vm;
 	vm_map_t map;
 	vm_offset_t start, end, pgoff;
 	struct pageset *ps;
 	int n;
 
 	DDP_ASSERT_LOCKED(toep);
 
 	/*
 	 * The AIO subsystem will cancel and drain all requests before
 	 * permitting a process to exit or exec, so p_vmspace should
 	 * be stable here.
 	 */
 	vm = job->userproc->p_vmspace;
 	map = &vm->vm_map;
 	start = (uintptr_t)job->uaiocb.aio_buf;
 	pgoff = start & PAGE_MASK;
 	end = round_page(start + job->uaiocb.aio_nbytes);
 	start = trunc_page(start);
 
 	if (end - start > MAX_DDP_BUFFER_SIZE) {
 		/*
 		 * Truncate the request to a short read.
 		 * Alternatively, we could DDP in chunks to the larger
 		 * buffer, but that would be quite a bit more work.
 		 *
 		 * When truncating, round the request down to avoid
 		 * crossing a cache line on the final transaction.
 		 */
 		end = rounddown2(start + MAX_DDP_BUFFER_SIZE, CACHE_LINE_SIZE);
 #ifdef VERBOSE_TRACES
 		CTR4(KTR_CXGBE, "%s: tid %d, truncating size from %lu to %lu",
 		    __func__, toep->tid, (unsigned long)job->uaiocb.aio_nbytes,
 		    (unsigned long)(end - (start + pgoff)));
 		job->uaiocb.aio_nbytes = end - (start + pgoff);
 #endif
 		end = round_page(end);
 	}
 
 	n = atop(end - start);
 
 	/*
 	 * Try to reuse a cached pageset.
 	 */
 	TAILQ_FOREACH(ps, &toep->ddp_cached_pagesets, link) {
 		if (pscmp(ps, vm, start, n, pgoff,
 		    job->uaiocb.aio_nbytes) == 0) {
 			TAILQ_REMOVE(&toep->ddp_cached_pagesets, ps, link);
 			toep->ddp_cached_count--;
 			*pps = ps;
 			return (0);
 		}
 	}
 
 	/*
 	 * If there are too many cached pagesets to create a new one,
 	 * free a pageset before creating a new one.
 	 */
 	KASSERT(toep->ddp_active_count + toep->ddp_cached_count <=
 	    nitems(toep->db), ("%s: too many wired pagesets", __func__));
 	if (toep->ddp_active_count + toep->ddp_cached_count ==
 	    nitems(toep->db)) {
 		KASSERT(toep->ddp_cached_count > 0,
 		    ("no cached pageset to free"));
 		ps = TAILQ_LAST(&toep->ddp_cached_pagesets, pagesetq);
 		TAILQ_REMOVE(&toep->ddp_cached_pagesets, ps, link);
 		toep->ddp_cached_count--;
 		free_pageset(toep->td, ps);
 	}
 	DDP_UNLOCK(toep);
 
 	/* Create a new pageset. */
 	ps = malloc(sizeof(*ps) + n * sizeof(vm_page_t), M_CXGBE, M_WAITOK |
 	    M_ZERO);
 	ps->pages = (vm_page_t *)(ps + 1);
 	ps->vm_timestamp = map->timestamp;
 	ps->npages = vm_fault_quick_hold_pages(map, start, end - start,
 	    VM_PROT_WRITE, ps->pages, n);
 
 	DDP_LOCK(toep);
 	if (ps->npages < 0) {
 		free(ps, M_CXGBE);
 		return (EFAULT);
 	}
 
 	KASSERT(ps->npages == n, ("hold_aio: page count mismatch: %d vs %d",
 	    ps->npages, n));
 
 	ps->offset = pgoff;
 	ps->len = job->uaiocb.aio_nbytes;
 	atomic_add_int(&vm->vm_refcnt, 1);
 	ps->vm = vm;
 
 	CTR5(KTR_CXGBE, "%s: tid %d, new pageset %p for job %p, npages %d",
 	    __func__, toep->tid, ps, job, ps->npages);
 	*pps = ps;
 	return (0);
 }
 
 static void
 ddp_complete_all(struct toepcb *toep, int error)
 {
 	struct kaiocb *job;
 
 	DDP_ASSERT_LOCKED(toep);
 	while (!TAILQ_EMPTY(&toep->ddp_aiojobq)) {
 		job = TAILQ_FIRST(&toep->ddp_aiojobq);
 		TAILQ_REMOVE(&toep->ddp_aiojobq, job, list);
 		toep->ddp_waiting_count--;
 		if (aio_clear_cancel_function(job))
 			ddp_complete_one(job, error);
 	}
 }
 
 static void
 aio_ddp_cancel_one(struct kaiocb *job)
 {
 	long copied;
 
 	/*
 	 * If this job had copied data out of the socket buffer before
 	 * it was cancelled, report it as a short read rather than an
 	 * error.
 	 */
 	copied = job->aio_received;
 	if (copied != 0)
 		aio_complete(job, copied, 0);
 	else
 		aio_cancel(job);
 }
 
 /*
  * Called when the main loop wants to requeue a job to retry it later.
  * Deals with the race of the job being cancelled while it was being
  * examined.
  */
 static void
 aio_ddp_requeue_one(struct toepcb *toep, struct kaiocb *job)
 {
 
 	DDP_ASSERT_LOCKED(toep);
 	if (!(toep->ddp_flags & DDP_DEAD) &&
 	    aio_set_cancel_function(job, t4_aio_cancel_queued)) {
 		TAILQ_INSERT_HEAD(&toep->ddp_aiojobq, job, list);
 		toep->ddp_waiting_count++;
 	} else
 		aio_ddp_cancel_one(job);
 }
 
 static void
 aio_ddp_requeue(struct toepcb *toep)
 {
 	struct adapter *sc = td_adapter(toep->td);
 	struct socket *so;
 	struct sockbuf *sb;
 	struct inpcb *inp;
 	struct kaiocb *job;
 	struct ddp_buffer *db;
 	size_t copied, offset, resid;
 	struct pageset *ps;
 	struct mbuf *m;
 	uint64_t ddp_flags, ddp_flags_mask;
 	struct wrqe *wr;
 	int buf_flag, db_idx, error;
 
 	DDP_ASSERT_LOCKED(toep);
 
 restart:
 	if (toep->ddp_flags & DDP_DEAD) {
 		MPASS(toep->ddp_waiting_count == 0);
 		MPASS(toep->ddp_active_count == 0);
 		return;
 	}
 
 	if (toep->ddp_waiting_count == 0 ||
 	    toep->ddp_active_count == nitems(toep->db)) {
 		return;
 	}
 
 	job = TAILQ_FIRST(&toep->ddp_aiojobq);
 	so = job->fd_file->f_data;
 	sb = &so->so_rcv;
 	SOCKBUF_LOCK(sb);
 
 	/* We will never get anything unless we are or were connected. */
 	if (!(so->so_state & (SS_ISCONNECTED|SS_ISDISCONNECTED))) {
 		SOCKBUF_UNLOCK(sb);
 		ddp_complete_all(toep, ENOTCONN);
 		return;
 	}
 
 	KASSERT(toep->ddp_active_count == 0 || sbavail(sb) == 0,
 	    ("%s: pending sockbuf data and DDP is active", __func__));
 
 	/* Abort if socket has reported problems. */
 	/* XXX: Wait for any queued DDP's to finish and/or flush them? */
 	if (so->so_error && sbavail(sb) == 0) {
 		toep->ddp_waiting_count--;
 		TAILQ_REMOVE(&toep->ddp_aiojobq, job, list);
 		if (!aio_clear_cancel_function(job)) {
 			SOCKBUF_UNLOCK(sb);
 			goto restart;
 		}
 
 		/*
 		 * If this job has previously copied some data, report
 		 * a short read and leave the error to be reported by
 		 * a future request.
 		 */
 		copied = job->aio_received;
 		if (copied != 0) {
 			SOCKBUF_UNLOCK(sb);
 			aio_complete(job, copied, 0);
 			goto restart;
 		}
 		error = so->so_error;
 		so->so_error = 0;
 		SOCKBUF_UNLOCK(sb);
 		aio_complete(job, -1, error);
 		goto restart;
 	}
 
 	/*
 	 * Door is closed.  If there is pending data in the socket buffer,
 	 * deliver it.  If there are pending DDP requests, wait for those
 	 * to complete.  Once they have completed, return EOF reads.
 	 */
 	if (sb->sb_state & SBS_CANTRCVMORE && sbavail(sb) == 0) {
 		SOCKBUF_UNLOCK(sb);
 		if (toep->ddp_active_count != 0)
 			return;
 		ddp_complete_all(toep, 0);
 		return;
 	}
 
 	/*
 	 * If DDP is not enabled and there is no pending socket buffer
 	 * data, try to enable DDP.
 	 */
 	if (sbavail(sb) == 0 && (toep->ddp_flags & DDP_ON) == 0) {
 		SOCKBUF_UNLOCK(sb);
 
 		/*
 		 * Wait for the card to ACK that DDP is enabled before
 		 * queueing any buffers.  Currently this waits for an
 		 * indicate to arrive.  This could use a TCB_SET_FIELD_RPL
 		 * message to know that DDP was enabled instead of waiting
 		 * for the indicate which would avoid copying the indicate
 		 * if no data is pending.
 		 *
 		 * XXX: Might want to limit the indicate size to the size
 		 * of the first queued request.
 		 */
 		if ((toep->ddp_flags & DDP_SC_REQ) == 0)
 			enable_ddp(sc, toep);
 		return;
 	}
 	SOCKBUF_UNLOCK(sb);
 
 	/*
 	 * If another thread is queueing a buffer for DDP, let it
 	 * drain any work and return.
 	 */
 	if (toep->ddp_queueing != NULL)
 		return;
 
 	/* Take the next job to prep it for DDP. */
 	toep->ddp_waiting_count--;
 	TAILQ_REMOVE(&toep->ddp_aiojobq, job, list);
 	if (!aio_clear_cancel_function(job))
 		goto restart;
 	toep->ddp_queueing = job;
 
 	/* NB: This drops DDP_LOCK while it holds the backing VM pages. */
 	error = hold_aio(toep, job, &ps);
 	if (error != 0) {
 		ddp_complete_one(job, error);
 		toep->ddp_queueing = NULL;
 		goto restart;
 	}
 
 	SOCKBUF_LOCK(sb);
 	if (so->so_error && sbavail(sb) == 0) {
 		copied = job->aio_received;
 		if (copied != 0) {
 			SOCKBUF_UNLOCK(sb);
 			recycle_pageset(toep, ps);
 			aio_complete(job, copied, 0);
 			toep->ddp_queueing = NULL;
 			goto restart;
 		}
 
 		error = so->so_error;
 		so->so_error = 0;
 		SOCKBUF_UNLOCK(sb);
 		recycle_pageset(toep, ps);
 		aio_complete(job, -1, error);
 		toep->ddp_queueing = NULL;
 		goto restart;
 	}
 
 	if (sb->sb_state & SBS_CANTRCVMORE && sbavail(sb) == 0) {
 		SOCKBUF_UNLOCK(sb);
 		recycle_pageset(toep, ps);
 		if (toep->ddp_active_count != 0) {
 			/*
 			 * The door is closed, but there are still pending
 			 * DDP buffers.  Requeue.  These jobs will all be
 			 * completed once those buffers drain.
 			 */
 			aio_ddp_requeue_one(toep, job);
 			toep->ddp_queueing = NULL;
 			return;
 		}
 		ddp_complete_one(job, 0);
 		ddp_complete_all(toep, 0);
 		toep->ddp_queueing = NULL;
 		return;
 	}
 
 sbcopy:
 	/*
 	 * If the toep is dead, there shouldn't be any data in the socket
 	 * buffer, so the above case should have handled this.
 	 */
 	MPASS(!(toep->ddp_flags & DDP_DEAD));
 
 	/*
 	 * If there is pending data in the socket buffer (either
 	 * from before the requests were queued or a DDP indicate),
 	 * copy those mbufs out directly.
 	 */
 	copied = 0;
 	offset = ps->offset + job->aio_received;
 	MPASS(job->aio_received <= job->uaiocb.aio_nbytes);
 	resid = job->uaiocb.aio_nbytes - job->aio_received;
 	m = sb->sb_mb;
 	KASSERT(m == NULL || toep->ddp_active_count == 0,
 	    ("%s: sockbuf data with active DDP", __func__));
 	while (m != NULL && resid > 0) {
 		struct iovec iov[1];
 		struct uio uio;
 		int error;
 
 		iov[0].iov_base = mtod(m, void *);
 		iov[0].iov_len = m->m_len;
 		if (iov[0].iov_len > resid)
 			iov[0].iov_len = resid;
 		uio.uio_iov = iov;
 		uio.uio_iovcnt = 1;
 		uio.uio_offset = 0;
 		uio.uio_resid = iov[0].iov_len;
 		uio.uio_segflg = UIO_SYSSPACE;
 		uio.uio_rw = UIO_WRITE;
 		error = uiomove_fromphys(ps->pages, offset + copied,
 		    uio.uio_resid, &uio);
 		MPASS(error == 0 && uio.uio_resid == 0);
 		copied += uio.uio_offset;
 		resid -= uio.uio_offset;
 		m = m->m_next;
 	}
 	if (copied != 0) {
 		sbdrop_locked(sb, copied);
 		job->aio_received += copied;
 		job->msgrcv = 1;
 		copied = job->aio_received;
 		inp = sotoinpcb(so);
 		if (!INP_TRY_WLOCK(inp)) {
 			/*
 			 * The reference on the socket file descriptor in
 			 * the AIO job should keep 'sb' and 'inp' stable.
 			 * Our caller has a reference on the 'toep' that
 			 * keeps it stable.
 			 */
 			SOCKBUF_UNLOCK(sb);
 			DDP_UNLOCK(toep);
 			INP_WLOCK(inp);
 			DDP_LOCK(toep);
 			SOCKBUF_LOCK(sb);
 
 			/*
 			 * If the socket has been closed, we should detect
 			 * that and complete this request if needed on
 			 * the next trip around the loop.
 			 */
 		}
 		t4_rcvd_locked(&toep->td->tod, intotcpcb(inp));
 		INP_WUNLOCK(inp);
 		if (resid == 0 || toep->ddp_flags & DDP_DEAD) {
 			/*
 			 * We filled the entire buffer with socket
 			 * data, DDP is not being used, or the socket
 			 * is being shut down, so complete the
 			 * request.
 			 */
 			SOCKBUF_UNLOCK(sb);
 			recycle_pageset(toep, ps);
 			aio_complete(job, copied, 0);
 			toep->ddp_queueing = NULL;
 			goto restart;
 		}
 
 		/*
 		 * If DDP is not enabled, requeue this request and restart.
 		 * This will either enable DDP or wait for more data to
 		 * arrive on the socket buffer.
 		 */
 		if ((toep->ddp_flags & (DDP_ON | DDP_SC_REQ)) != DDP_ON) {
 			SOCKBUF_UNLOCK(sb);
 			recycle_pageset(toep, ps);
 			aio_ddp_requeue_one(toep, job);
 			toep->ddp_queueing = NULL;
 			goto restart;
 		}
 
 		/*
 		 * An indicate might have arrived and been added to
 		 * the socket buffer while it was unlocked after the
 		 * copy to lock the INP.  If so, restart the copy.
 		 */
 		if (sbavail(sb) != 0)
 			goto sbcopy;
 	}
 	SOCKBUF_UNLOCK(sb);
 
 	if (prep_pageset(sc, toep, ps) == 0) {
 		recycle_pageset(toep, ps);
 		aio_ddp_requeue_one(toep, job);
 		toep->ddp_queueing = NULL;
 
 		/*
 		 * XXX: Need to retry this later.  Mostly need a trigger
 		 * when page pods are freed up.
 		 */
 		printf("%s: prep_pageset failed\n", __func__);
 		return;
 	}
 
 	/* Determine which DDP buffer to use. */
 	if (toep->db[0].job == NULL) {
 		db_idx = 0;
 	} else {
 		MPASS(toep->db[1].job == NULL);
 		db_idx = 1;
 	}
 
 	ddp_flags = 0;
 	ddp_flags_mask = 0;
 	if (db_idx == 0) {
 		ddp_flags |= V_TF_DDP_BUF0_VALID(1);
 		if (so->so_state & SS_NBIO)
 			ddp_flags |= V_TF_DDP_BUF0_FLUSH(1);
 		ddp_flags_mask |= V_TF_DDP_PSH_NO_INVALIDATE0(1) |
 		    V_TF_DDP_PUSH_DISABLE_0(1) | V_TF_DDP_PSHF_ENABLE_0(1) |
 		    V_TF_DDP_BUF0_FLUSH(1) | V_TF_DDP_BUF0_VALID(1);
 		buf_flag = DDP_BUF0_ACTIVE;
 	} else {
 		ddp_flags |= V_TF_DDP_BUF1_VALID(1);
 		if (so->so_state & SS_NBIO)
 			ddp_flags |= V_TF_DDP_BUF1_FLUSH(1);
 		ddp_flags_mask |= V_TF_DDP_PSH_NO_INVALIDATE1(1) |
 		    V_TF_DDP_PUSH_DISABLE_1(1) | V_TF_DDP_PSHF_ENABLE_1(1) |
 		    V_TF_DDP_BUF1_FLUSH(1) | V_TF_DDP_BUF1_VALID(1);
 		buf_flag = DDP_BUF1_ACTIVE;
 	}
 	MPASS((toep->ddp_flags & buf_flag) == 0);
 	if ((toep->ddp_flags & (DDP_BUF0_ACTIVE | DDP_BUF1_ACTIVE)) == 0) {
 		MPASS(db_idx == 0);
 		MPASS(toep->ddp_active_id == -1);
 		MPASS(toep->ddp_active_count == 0);
 		ddp_flags_mask |= V_TF_DDP_ACTIVE_BUF(1);
 	}
 
 	/*
 	 * The TID for this connection should still be valid.  If DDP_DEAD
 	 * is set, SBS_CANTRCVMORE should be set, so we shouldn't be
 	 * this far anyway.  Even if the socket is closing on the other
 	 * end, the AIO job holds a reference on this end of the socket
 	 * which will keep it open and keep the TCP PCB attached until
 	 * after the job is completed.
 	 */
 	wr = mk_update_tcb_for_ddp(sc, toep, db_idx, ps, job->aio_received,
 	    ddp_flags, ddp_flags_mask);
 	if (wr == NULL) {
 		recycle_pageset(toep, ps);
 		aio_ddp_requeue_one(toep, job);
 		toep->ddp_queueing = NULL;
 
 		/*
 		 * XXX: Need a way to kick a retry here.
 		 *
 		 * XXX: We know the fixed size needed and could
 		 * preallocate this using a blocking request at the
 		 * start of the task to avoid having to handle this
 		 * edge case.
 		 */
 		printf("%s: mk_update_tcb_for_ddp failed\n", __func__);
 		return;
 	}
 
 	if (!aio_set_cancel_function(job, t4_aio_cancel_active)) {
 		free_wrqe(wr);
 		recycle_pageset(toep, ps);
 		aio_ddp_cancel_one(job);
 		toep->ddp_queueing = NULL;
 		goto restart;
 	}
 
 #ifdef VERBOSE_TRACES
 	CTR5(KTR_CXGBE, "%s: scheduling %p for DDP[%d] (flags %#lx/%#lx)",
 	    __func__, job, db_idx, ddp_flags, ddp_flags_mask);
 #endif
 	/* Give the chip the go-ahead. */
 	t4_wrq_tx(sc, wr);
 	db = &toep->db[db_idx];
 	db->cancel_pending = 0;
 	db->job = job;
 	db->ps = ps;
 	toep->ddp_queueing = NULL;
 	toep->ddp_flags |= buf_flag;
 	toep->ddp_active_count++;
 	if (toep->ddp_active_count == 1) {
 		MPASS(toep->ddp_active_id == -1);
 		toep->ddp_active_id = db_idx;
 		CTR2(KTR_CXGBE, "%s: ddp_active_id = %d", __func__,
 		    toep->ddp_active_id);
 	}
 	goto restart;
 }
 
 void
 ddp_queue_toep(struct toepcb *toep)
 {
 
 	DDP_ASSERT_LOCKED(toep);
 	if (toep->ddp_flags & DDP_TASK_ACTIVE)
 		return;
 	toep->ddp_flags |= DDP_TASK_ACTIVE;
 	hold_toepcb(toep);
 	soaio_enqueue(&toep->ddp_requeue_task);
 }
 
 static void
 aio_ddp_requeue_task(void *context, int pending)
 {
 	struct toepcb *toep = context;
 
 	DDP_LOCK(toep);
 	aio_ddp_requeue(toep);
 	toep->ddp_flags &= ~DDP_TASK_ACTIVE;
 	DDP_UNLOCK(toep);
 
 	free_toepcb(toep);
 }
 
 static void
 t4_aio_cancel_active(struct kaiocb *job)
 {
 	struct socket *so = job->fd_file->f_data;
 	struct tcpcb *tp = so_sototcpcb(so);
 	struct toepcb *toep = tp->t_toe;
 	struct adapter *sc = td_adapter(toep->td);
 	uint64_t valid_flag;
 	int i;
 
 	DDP_LOCK(toep);
 	if (aio_cancel_cleared(job)) {
 		DDP_UNLOCK(toep);
 		aio_ddp_cancel_one(job);
 		return;
 	}
 
 	for (i = 0; i < nitems(toep->db); i++) {
 		if (toep->db[i].job == job) {
 			/* Should only ever get one cancel request for a job. */
 			MPASS(toep->db[i].cancel_pending == 0);
 
 			/*
 			 * Invalidate this buffer.  It will be
 			 * cancelled or partially completed once the
 			 * card ACKs the invalidate.
 			 */
 			valid_flag = i == 0 ? V_TF_DDP_BUF0_VALID(1) :
 			    V_TF_DDP_BUF1_VALID(1);
 			t4_set_tcb_field(sc, toep->ctrlq, toep->tid,
 			    W_TCB_RX_DDP_FLAGS, valid_flag, 0, 1,
 			    i + DDP_BUF0_INVALIDATED,
 			    toep->ofld_rxq->iq.abs_id);
 			toep->db[i].cancel_pending = 1;
 			CTR2(KTR_CXGBE, "%s: request %p marked pending",
 			    __func__, job);
 			break;
 		}
 	}
 	DDP_UNLOCK(toep);
 }
 
 static void
 t4_aio_cancel_queued(struct kaiocb *job)
 {
 	struct socket *so = job->fd_file->f_data;
 	struct tcpcb *tp = so_sototcpcb(so);
 	struct toepcb *toep = tp->t_toe;
 
 	DDP_LOCK(toep);
 	if (!aio_cancel_cleared(job)) {
 		TAILQ_REMOVE(&toep->ddp_aiojobq, job, list);
 		toep->ddp_waiting_count--;
 		if (toep->ddp_waiting_count == 0)
 			ddp_queue_toep(toep);
 	}
 	CTR2(KTR_CXGBE, "%s: request %p cancelled", __func__, job);
 	DDP_UNLOCK(toep);
 
 	aio_ddp_cancel_one(job);
 }
 
 int
 t4_aio_queue_ddp(struct socket *so, struct kaiocb *job)
 {
 	struct tcpcb *tp = so_sototcpcb(so);
 	struct toepcb *toep = tp->t_toe;
 
 
 	/* Ignore writes. */
 	if (job->uaiocb.aio_lio_opcode != LIO_READ)
 		return (EOPNOTSUPP);
 
 	DDP_LOCK(toep);
 
 	/*
 	 * XXX: Think about possibly returning errors for ENOTCONN,
 	 * etc.  Perhaps the caller would only queue the request
 	 * if it failed with EOPNOTSUPP?
 	 */
 
 #ifdef VERBOSE_TRACES
 	CTR2(KTR_CXGBE, "%s: queueing %p", __func__, job);
 #endif
 	if (!aio_set_cancel_function(job, t4_aio_cancel_queued))
 		panic("new job was cancelled");
 	TAILQ_INSERT_TAIL(&toep->ddp_aiojobq, job, list);
 	toep->ddp_waiting_count++;
 	toep->ddp_flags |= DDP_OK;
 
 	/*
 	 * Try to handle this request synchronously.  If this has
 	 * to block because the task is running, it will just bail
 	 * and let the task handle it instead.
 	 */
 	aio_ddp_requeue(toep);
 	DDP_UNLOCK(toep);
 	return (0);
 }
 
 int
 t4_ddp_mod_load(void)
 {
 
 	t4_register_cpl_handler(CPL_RX_DATA_DDP, do_rx_data_ddp);
 	t4_register_cpl_handler(CPL_RX_DDP_COMPLETE, do_rx_ddp_complete);
 	TAILQ_INIT(&ddp_orphan_pagesets);
 	mtx_init(&ddp_orphan_pagesets_lock, "ddp orphans", NULL, MTX_DEF);
 	TASK_INIT(&ddp_orphan_task, 0, ddp_free_orphan_pagesets, NULL);
 	return (0);
 }
 
 void
 t4_ddp_mod_unload(void)
 {
 
 	taskqueue_drain(taskqueue_thread, &ddp_orphan_task);
 	MPASS(TAILQ_EMPTY(&ddp_orphan_pagesets));
 	mtx_destroy(&ddp_orphan_pagesets_lock);
 	t4_register_cpl_handler(CPL_RX_DATA_DDP, NULL);
 	t4_register_cpl_handler(CPL_RX_DDP_COMPLETE, NULL);
 }
 #endif
Index: user/alc/PQ_LAUNDRY/sys/dev/etherswitch/arswitch/arswitch.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/dev/etherswitch/arswitch/arswitch.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/dev/etherswitch/arswitch/arswitch.c	(revision 303775)
@@ -1,905 +1,989 @@
 /*-
  * Copyright (c) 2011-2012 Stefan Bethke.
  * Copyright (c) 2012 Adrian Chadd.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include <sys/param.h>
 #include <sys/bus.h>
 #include <sys/errno.h>
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/module.h>
 #include <sys/socket.h>
 #include <sys/sockio.h>
 #include <sys/sysctl.h>
 #include <sys/systm.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_arp.h>
 #include <net/ethernet.h>
 #include <net/if_dl.h>
 #include <net/if_media.h>
 #include <net/if_types.h>
 
 #include <machine/bus.h>
 #include <dev/iicbus/iic.h>
 #include <dev/iicbus/iiconf.h>
 #include <dev/iicbus/iicbus.h>
 #include <dev/mii/mii.h>
 #include <dev/mii/miivar.h>
 #include <dev/mdio/mdio.h>
 
 #include <dev/etherswitch/etherswitch.h>
 
 #include <dev/etherswitch/arswitch/arswitchreg.h>
 #include <dev/etherswitch/arswitch/arswitchvar.h>
 #include <dev/etherswitch/arswitch/arswitch_reg.h>
 #include <dev/etherswitch/arswitch/arswitch_phy.h>
 #include <dev/etherswitch/arswitch/arswitch_vlans.h>
 
 #include <dev/etherswitch/arswitch/arswitch_7240.h>
 #include <dev/etherswitch/arswitch/arswitch_8216.h>
 #include <dev/etherswitch/arswitch/arswitch_8226.h>
 #include <dev/etherswitch/arswitch/arswitch_8316.h>
 #include <dev/etherswitch/arswitch/arswitch_8327.h>
 #include <dev/etherswitch/arswitch/arswitch_9340.h>
 
 #include "mdio_if.h"
 #include "miibus_if.h"
 #include "etherswitch_if.h"
 
 #if	defined(DEBUG)
 static SYSCTL_NODE(_debug, OID_AUTO, arswitch, CTLFLAG_RD, 0, "arswitch");
 #endif
 
+/* Map ETHERSWITCH_PORT_LED_* to Atheros pattern codes */
+static int led_pattern_table[] = {
+	[ETHERSWITCH_PORT_LED_DEFAULT] = 0x3,
+	[ETHERSWITCH_PORT_LED_ON] = 0x2,
+	[ETHERSWITCH_PORT_LED_OFF] = 0x0,
+	[ETHERSWITCH_PORT_LED_BLINK] = 0x1
+};
+
 static inline int arswitch_portforphy(int phy);
 static void arswitch_tick(void *arg);
 static int arswitch_ifmedia_upd(struct ifnet *);
 static void arswitch_ifmedia_sts(struct ifnet *, struct ifmediareq *);
 static int ar8xxx_port_vlan_setup(struct arswitch_softc *sc,
     etherswitch_port_t *p);
 static int ar8xxx_port_vlan_get(struct arswitch_softc *sc,
     etherswitch_port_t *p);
+static int arswitch_setled(struct arswitch_softc *sc, int phy, int led,
+    int style);
 
 static int
 arswitch_probe(device_t dev)
 {
 	struct arswitch_softc *sc;
 	uint32_t id;
 	char *chipname, desc[256];
 
 	sc = device_get_softc(dev);
 	bzero(sc, sizeof(*sc));
 	sc->page = -1;
 
 	/* AR7240 probe */
 	if (ar7240_probe(dev) == 0) {
 		chipname = "AR7240";
 		sc->sc_switchtype = AR8X16_SWITCH_AR7240;
 		sc->is_internal_switch = 1;
 		id = 0;
 		goto done;
 	}
 
 	/* AR9340 probe */
 	if (ar9340_probe(dev) == 0) {
 		chipname = "AR9340";
 		sc->sc_switchtype = AR8X16_SWITCH_AR9340;
 		sc->is_internal_switch = 1;
 		id = 0;
 		goto done;
 	}
 
 	/* AR8xxx probe */
 	id = arswitch_readreg(dev, AR8X16_REG_MASK_CTRL);
 	sc->chip_rev = (id & AR8X16_MASK_CTRL_REV_MASK);
 	sc->chip_ver = (id & AR8X16_MASK_CTRL_VER_MASK) > AR8X16_MASK_CTRL_VER_SHIFT;
 	switch (id & (AR8X16_MASK_CTRL_VER_MASK | AR8X16_MASK_CTRL_REV_MASK)) {
 	case 0x0101:
 		chipname = "AR8216";
 		sc->sc_switchtype = AR8X16_SWITCH_AR8216;
 		break;
 	case 0x0201:
 		chipname = "AR8226";
 		sc->sc_switchtype = AR8X16_SWITCH_AR8226;
 		break;
 	/* 0x0301 - AR8236 */
 	case 0x1000:
 	case 0x1001:
 		chipname = "AR8316";
 		sc->sc_switchtype = AR8X16_SWITCH_AR8316;
 		break;
 	case 0x1202:
 	case 0x1204:
 		chipname = "AR8327";
 		sc->sc_switchtype = AR8X16_SWITCH_AR8327;
 		sc->mii_lo_first = 1;
 		break;
 	default:
 		chipname = NULL;
 	}
 
 done:
 
 	DPRINTF(dev, "chipname=%s, id=%08x\n", chipname, id);
 	if (chipname != NULL) {
 		snprintf(desc, sizeof(desc),
 		    "Atheros %s Ethernet Switch (ver %d rev %d)",
 		    chipname,
 		    sc->chip_ver,
 		    sc->chip_rev);
 		device_set_desc_copy(dev, desc);
 		return (BUS_PROBE_DEFAULT);
 	}
 	return (ENXIO);
 }
 
 static int
 arswitch_attach_phys(struct arswitch_softc *sc)
 {
 	int phy, err = 0;
 	char name[IFNAMSIZ];
 
 	/* PHYs need an interface, so we generate a dummy one */
 	snprintf(name, IFNAMSIZ, "%sport", device_get_nameunit(sc->sc_dev));
 	for (phy = 0; phy < sc->numphys; phy++) {
 		sc->ifp[phy] = if_alloc(IFT_ETHER);
 		sc->ifp[phy]->if_softc = sc;
 		sc->ifp[phy]->if_flags |= IFF_UP | IFF_BROADCAST |
 		    IFF_DRV_RUNNING | IFF_SIMPLEX;
 		sc->ifname[phy] = malloc(strlen(name)+1, M_DEVBUF, M_WAITOK);
 		bcopy(name, sc->ifname[phy], strlen(name)+1);
 		if_initname(sc->ifp[phy], sc->ifname[phy],
 		    arswitch_portforphy(phy));
 		err = mii_attach(sc->sc_dev, &sc->miibus[phy], sc->ifp[phy],
 		    arswitch_ifmedia_upd, arswitch_ifmedia_sts, \
 		    BMSR_DEFCAPMASK, phy, MII_OFFSET_ANY, 0);
 #if 0
 		DPRINTF(sc->sc_dev, "%s attached to pseudo interface %s\n",
 		    device_get_nameunit(sc->miibus[phy]),
 		    sc->ifp[phy]->if_xname);
 #endif
 		if (err != 0) {
 			device_printf(sc->sc_dev,
 			    "attaching PHY %d failed\n",
 			    phy);
+			return (err);
 		}
+
+		if (AR8X16_IS_SWITCH(sc, AR8327)) {
+			int led;
+			char ledname[IFNAMSIZ+4];
+
+			for (led = 0; led < 3; led++) {
+				sprintf(ledname, "%s%dled%d", name,
+				    arswitch_portforphy(phy), led+1);
+				sc->dev_led[phy][led].sc = sc;
+				sc->dev_led[phy][led].phy = phy;
+				sc->dev_led[phy][led].lednum = led;
+			}
+		}
 	}
-	return (err);
+	return (0);
 }
 
 static int
 arswitch_reset(device_t dev)
 {
 
 	arswitch_writereg(dev, AR8X16_REG_MASK_CTRL,
 	    AR8X16_MASK_CTRL_SOFT_RESET);
 	DELAY(1000);
 	if (arswitch_readreg(dev, AR8X16_REG_MASK_CTRL) &
 	    AR8X16_MASK_CTRL_SOFT_RESET) {
 		device_printf(dev, "unable to reset switch\n");
 		return (-1);
 	}
 	return (0);
 }
 
 static int
 arswitch_set_vlan_mode(struct arswitch_softc *sc, uint32_t mode)
 {
 
 	/* Check for invalid modes. */
 	if ((mode & sc->info.es_vlan_caps) != mode)
 		return (EINVAL);
 
 	switch (mode) {
 	case ETHERSWITCH_VLAN_DOT1Q:
 		sc->vlan_mode = ETHERSWITCH_VLAN_DOT1Q;
 		break;
 	case ETHERSWITCH_VLAN_PORT:
 		sc->vlan_mode = ETHERSWITCH_VLAN_PORT;
 		break;
 	default:
 		sc->vlan_mode = 0;
 	}
 
 	/* Reset VLANs. */
 	sc->hal.arswitch_vlan_init_hw(sc);
 
 	return (0);
 }
 
 static void
 ar8xxx_port_init(struct arswitch_softc *sc, int port)
 {
 
 	/* Port0 - CPU */
 	if (port == AR8X16_PORT_CPU) {
 		arswitch_writereg(sc->sc_dev, AR8X16_REG_PORT_STS(0),
 		    (AR8X16_IS_SWITCH(sc, AR8216) ?
 		    AR8X16_PORT_STS_SPEED_100 : AR8X16_PORT_STS_SPEED_1000) |
 		    (AR8X16_IS_SWITCH(sc, AR8216) ? 0 : AR8X16_PORT_STS_RXFLOW) |
 		    (AR8X16_IS_SWITCH(sc, AR8216) ? 0 : AR8X16_PORT_STS_TXFLOW) |
 		    AR8X16_PORT_STS_RXMAC |
 		    AR8X16_PORT_STS_TXMAC |
 		    AR8X16_PORT_STS_DUPLEX);
 		arswitch_writereg(sc->sc_dev, AR8X16_REG_PORT_CTRL(0),
 		    arswitch_readreg(sc->sc_dev, AR8X16_REG_PORT_CTRL(0)) &
 		    ~AR8X16_PORT_CTRL_HEADER);
 	} else {
 		/* Set ports to auto negotiation. */
 		arswitch_writereg(sc->sc_dev, AR8X16_REG_PORT_STS(port),
 		    AR8X16_PORT_STS_LINK_AUTO);
 		arswitch_writereg(sc->sc_dev, AR8X16_REG_PORT_CTRL(port),
 		    arswitch_readreg(sc->sc_dev, AR8X16_REG_PORT_CTRL(port)) &
 		    ~AR8X16_PORT_CTRL_HEADER);
 	}
 }
 
 static int
 ar8xxx_atu_flush(struct arswitch_softc *sc)
 {
 	int ret;
 
 	ret = arswitch_waitreg(sc->sc_dev,
 	    AR8216_REG_ATU,
 	    AR8216_ATU_ACTIVE,
 	    0,
 	    1000);
 
 	if (ret)
 		device_printf(sc->sc_dev, "%s: waitreg failed\n", __func__);
 
 	if (!ret)
 		arswitch_writereg(sc->sc_dev,
 		    AR8216_REG_ATU,
 		    AR8216_ATU_OP_FLUSH);
 
 	return (ret);
 }
 
 static int
 arswitch_attach(device_t dev)
 {
 	struct arswitch_softc *sc;
 	int err = 0;
 	int port;
 
 	sc = device_get_softc(dev);
 
 	/* sc->sc_switchtype is already decided in arswitch_probe() */
 	sc->sc_dev = dev;
 	mtx_init(&sc->sc_mtx, "arswitch", NULL, MTX_DEF);
 	sc->page = -1;
 	strlcpy(sc->info.es_name, device_get_desc(dev),
 	    sizeof(sc->info.es_name));
 
 	/* Default HAL methods */
 	sc->hal.arswitch_port_init = ar8xxx_port_init;
 	sc->hal.arswitch_port_vlan_setup = ar8xxx_port_vlan_setup;
 	sc->hal.arswitch_port_vlan_get = ar8xxx_port_vlan_get;
 	sc->hal.arswitch_vlan_init_hw = ar8xxx_reset_vlans;
 
 	sc->hal.arswitch_vlan_getvgroup = ar8xxx_getvgroup;
 	sc->hal.arswitch_vlan_setvgroup = ar8xxx_setvgroup;
 
 	sc->hal.arswitch_vlan_get_pvid = ar8xxx_get_pvid;
 	sc->hal.arswitch_vlan_set_pvid = ar8xxx_set_pvid;
 
 	sc->hal.arswitch_get_dot1q_vlan = ar8xxx_get_dot1q_vlan;
 	sc->hal.arswitch_set_dot1q_vlan = ar8xxx_set_dot1q_vlan;
 	sc->hal.arswitch_flush_dot1q_vlan = ar8xxx_flush_dot1q_vlan;
 	sc->hal.arswitch_purge_dot1q_vlan = ar8xxx_purge_dot1q_vlan;
 	sc->hal.arswitch_get_port_vlan = ar8xxx_get_port_vlan;
 	sc->hal.arswitch_set_port_vlan = ar8xxx_set_port_vlan;
 
 	sc->hal.arswitch_atu_flush = ar8xxx_atu_flush;
 
 	sc->hal.arswitch_phy_read = arswitch_readphy_internal;
 	sc->hal.arswitch_phy_write = arswitch_writephy_internal;
 
 
 	/*
 	 * Attach switch related functions
 	 */
 	if (AR8X16_IS_SWITCH(sc, AR7240))
 		ar7240_attach(sc);
 	else if (AR8X16_IS_SWITCH(sc, AR9340))
 		ar9340_attach(sc);
 	else if (AR8X16_IS_SWITCH(sc, AR8216))
 		ar8216_attach(sc);
 	else if (AR8X16_IS_SWITCH(sc, AR8226))
 		ar8226_attach(sc);
 	else if (AR8X16_IS_SWITCH(sc, AR8316))
 		ar8316_attach(sc);
 	else if (AR8X16_IS_SWITCH(sc, AR8327))
 		ar8327_attach(sc);
 	else {
 		DPRINTF(dev, "%s: unknown switch (%d)?\n", __func__, sc->sc_switchtype);
 		return (ENXIO);
 	}
 
 	/* Common defaults. */
 	sc->info.es_nports = 5; /* XXX technically 6, but 6th not used */
 
 	/* XXX Defaults for externally connected AR8316 */
 	sc->numphys = 4;
 	sc->phy4cpu = 1;
 	sc->is_rgmii = 1;
 	sc->is_gmii = 0;
 	sc->is_mii = 0;
 
 	(void) resource_int_value(device_get_name(dev), device_get_unit(dev),
 	    "numphys", &sc->numphys);
 	(void) resource_int_value(device_get_name(dev), device_get_unit(dev),
 	    "phy4cpu", &sc->phy4cpu);
 	(void) resource_int_value(device_get_name(dev), device_get_unit(dev),
 	    "is_rgmii", &sc->is_rgmii);
 	(void) resource_int_value(device_get_name(dev), device_get_unit(dev),
 	    "is_gmii", &sc->is_gmii);
 	(void) resource_int_value(device_get_name(dev), device_get_unit(dev),
 	    "is_mii", &sc->is_mii);
 
 	if (sc->numphys > AR8X16_NUM_PHYS)
 		sc->numphys = AR8X16_NUM_PHYS;
 
 	/* Reset the switch. */
 	if (arswitch_reset(dev)) {
 		DPRINTF(dev, "%s: arswitch_reset: failed\n", __func__);
 		return (ENXIO);
 	}
 
 	err = sc->hal.arswitch_hw_setup(sc);
 	DPRINTF(dev, "%s: hw_setup: err=%d\n", __func__, err);
 	if (err != 0)
 		return (err);
 
 	err = sc->hal.arswitch_hw_global_setup(sc);
 	DPRINTF(dev, "%s: hw_global_setup: err=%d\n", __func__, err);
 	if (err != 0)
 		return (err);
 
 	/* Initialize the switch ports. */
 	for (port = 0; port <= sc->numphys; port++) {
 		sc->hal.arswitch_port_init(sc, port);
 	}
 
 	/*
 	 * Attach the PHYs and complete the bus enumeration.
 	 */
 	err = arswitch_attach_phys(sc);
 	DPRINTF(dev, "%s: attach_phys: err=%d\n", __func__, err);
 	if (err != 0)
 		return (err);
 
 	/* Default to ingress filters off. */
 	err = arswitch_set_vlan_mode(sc, 0);
 	DPRINTF(dev, "%s: set_vlan_mode: err=%d\n", __func__, err);
 	if (err != 0)
 		return (err);
 
 	bus_generic_probe(dev);
 	bus_enumerate_hinted_children(dev);
 	err = bus_generic_attach(dev);
 	DPRINTF(dev, "%s: bus_generic_attach: err=%d\n", __func__, err);
 	if (err != 0)
 		return (err);
 	
 	callout_init_mtx(&sc->callout_tick, &sc->sc_mtx, 0);
 
 	ARSWITCH_LOCK(sc);
 	arswitch_tick(sc);
 	ARSWITCH_UNLOCK(sc);
 	
 	return (err);
 }
 
 static int
 arswitch_detach(device_t dev)
 {
 	struct arswitch_softc *sc = device_get_softc(dev);
 	int i;
 
 	callout_drain(&sc->callout_tick);
 
 	for (i=0; i < sc->numphys; i++) {
 		if (sc->miibus[i] != NULL)
 			device_delete_child(dev, sc->miibus[i]);
 		if (sc->ifp[i] != NULL)
 			if_free(sc->ifp[i]);
 		free(sc->ifname[i], M_DEVBUF);
 	}
 
 	bus_generic_detach(dev);
 	mtx_destroy(&sc->sc_mtx);
 
 	return (0);
 }
 
 /*
  * Convert PHY number to port number. PHY0 is connected to port 1, PHY1 to
  * port 2, etc.
  */
 static inline int
 arswitch_portforphy(int phy)
 {
 	return (phy+1);
 }
 
 static inline struct mii_data *
 arswitch_miiforport(struct arswitch_softc *sc, int port)
 {
 	int phy = port-1;
 
 	if (phy < 0 || phy >= sc->numphys)
 		return (NULL);
 	return (device_get_softc(sc->miibus[phy]));
 }
 
 static inline struct ifnet *
 arswitch_ifpforport(struct arswitch_softc *sc, int port)
 {
 	int phy = port-1;
 
 	if (phy < 0 || phy >= sc->numphys)
 		return (NULL);
 	return (sc->ifp[phy]);
 }
 
 /*
  * Convert port status to ifmedia.
  */
 static void
 arswitch_update_ifmedia(int portstatus, u_int *media_status, u_int *media_active)
 {
 	*media_active = IFM_ETHER;
 	*media_status = IFM_AVALID;
 
 	if ((portstatus & AR8X16_PORT_STS_LINK_UP) != 0)
 		*media_status |= IFM_ACTIVE;
 	else {
 		*media_active |= IFM_NONE;
 		return;
 	}
 	switch (portstatus & AR8X16_PORT_STS_SPEED_MASK) {
 	case AR8X16_PORT_STS_SPEED_10:
 		*media_active |= IFM_10_T;
 		break;
 	case AR8X16_PORT_STS_SPEED_100:
 		*media_active |= IFM_100_TX;
 		break;
 	case AR8X16_PORT_STS_SPEED_1000:
 		*media_active |= IFM_1000_T;
 		break;
 	}
 	if ((portstatus & AR8X16_PORT_STS_DUPLEX) == 0)
 		*media_active |= IFM_FDX;
 	else
 		*media_active |= IFM_HDX;
 	if ((portstatus & AR8X16_PORT_STS_TXFLOW) != 0)
 		*media_active |= IFM_ETH_TXPAUSE;
 	if ((portstatus & AR8X16_PORT_STS_RXFLOW) != 0)
 		*media_active |= IFM_ETH_RXPAUSE;
 }
 
 /*
  * Poll the status for all PHYs.  We're using the switch port status because
  * thats a lot quicker to read than talking to all the PHYs.  Care must be
  * taken that the resulting ifmedia_active is identical to what the PHY will
  * compute, or gratuitous link status changes will occur whenever the PHYs
  * update function is called.
  */
 static void
 arswitch_miipollstat(struct arswitch_softc *sc)
 {
 	int i;
 	struct mii_data *mii;
 	struct mii_softc *miisc;
 	int portstatus;
 	int port_flap = 0;
 
 	ARSWITCH_LOCK_ASSERT(sc, MA_OWNED);
 
 	for (i = 0; i < sc->numphys; i++) {
 		if (sc->miibus[i] == NULL)
 			continue;
 		mii = device_get_softc(sc->miibus[i]);
 		/* XXX This would be nice to have abstracted out to be per-chip */
 		/* AR8327/AR8337 has a different register base */
 		if (AR8X16_IS_SWITCH(sc, AR8327))
 			portstatus = arswitch_readreg(sc->sc_dev,
 			    AR8327_REG_PORT_STATUS(arswitch_portforphy(i)));
 		else
 			portstatus = arswitch_readreg(sc->sc_dev,
 			    AR8X16_REG_PORT_STS(arswitch_portforphy(i)));
 #if 0
 		DPRINTF(sc->sc_dev, "p[%d]=%b\n",
 		    i,
 		    portstatus,
 		    "\20\3TXMAC\4RXMAC\5TXFLOW\6RXFLOW\7"
 		    "DUPLEX\11LINK_UP\12LINK_AUTO\13LINK_PAUSE");
 #endif
 		/*
 		 * If the current status is down, but we have a link
 		 * status showing up, we need to do an ATU flush.
 		 */
 		if ((mii->mii_media_status & IFM_ACTIVE) == 0 &&
 		    (portstatus & AR8X16_PORT_STS_LINK_UP) != 0) {
 			device_printf(sc->sc_dev, "%s: port %d: port -> UP\n",
 			    __func__,
 			    i);
 			port_flap = 1;
 		}
 		/*
 		 * and maybe if a port goes up->down?
 		 */
 		if ((mii->mii_media_status & IFM_ACTIVE) != 0 &&
 		    (portstatus & AR8X16_PORT_STS_LINK_UP) == 0) {
 			device_printf(sc->sc_dev, "%s: port %d: port -> DOWN\n",
 			    __func__,
 			    i);
 			port_flap = 1;
 		}
 		arswitch_update_ifmedia(portstatus, &mii->mii_media_status,
 		    &mii->mii_media_active);
 		LIST_FOREACH(miisc, &mii->mii_phys, mii_list) {
 			if (IFM_INST(mii->mii_media.ifm_cur->ifm_media) !=
 			    miisc->mii_inst)
 				continue;
 			mii_phy_update(miisc, MII_POLLSTAT);
 		}
 	}
 
 	/* If a port went from down->up, flush the ATU */
 	if (port_flap)
 		sc->hal.arswitch_atu_flush(sc);
 }
 
 static void
 arswitch_tick(void *arg)
 {
 	struct arswitch_softc *sc = arg;
 
 	arswitch_miipollstat(sc);
 	callout_reset(&sc->callout_tick, hz, arswitch_tick, sc);
 }
 
 static void
 arswitch_lock(device_t dev)
 {
 	struct arswitch_softc *sc = device_get_softc(dev);
 
 	ARSWITCH_LOCK_ASSERT(sc, MA_NOTOWNED);
 	ARSWITCH_LOCK(sc);
 }
 
 static void
 arswitch_unlock(device_t dev)
 {
 	struct arswitch_softc *sc = device_get_softc(dev);
 
 	ARSWITCH_LOCK_ASSERT(sc, MA_OWNED);
 	ARSWITCH_UNLOCK(sc);
 }
 
 static etherswitch_info_t *
 arswitch_getinfo(device_t dev)
 {
 	struct arswitch_softc *sc = device_get_softc(dev);
 	
 	return (&sc->info);
 }
 
 static int
 ar8xxx_port_vlan_get(struct arswitch_softc *sc, etherswitch_port_t *p)
 {
 	uint32_t reg;
 
 	ARSWITCH_LOCK(sc);
 
 	/* Retrieve the PVID. */
 	sc->hal.arswitch_vlan_get_pvid(sc, p->es_port, &p->es_pvid);
 
 	/* Port flags. */
 	reg = arswitch_readreg(sc->sc_dev, AR8X16_REG_PORT_CTRL(p->es_port));
 	if (reg & AR8X16_PORT_CTRL_DOUBLE_TAG)
 		p->es_flags |= ETHERSWITCH_PORT_DOUBLE_TAG;
 	reg >>= AR8X16_PORT_CTRL_EGRESS_VLAN_MODE_SHIFT;
 	if ((reg & 0x3) == AR8X16_PORT_CTRL_EGRESS_VLAN_MODE_ADD)
 		p->es_flags |= ETHERSWITCH_PORT_ADDTAG;
 	if ((reg & 0x3) == AR8X16_PORT_CTRL_EGRESS_VLAN_MODE_STRIP)
 		p->es_flags |= ETHERSWITCH_PORT_STRIPTAG;
 	ARSWITCH_UNLOCK(sc);
 
 	return (0);
 }
 
 static int
 arswitch_is_cpuport(struct arswitch_softc *sc, int port)
 {
 
 	return ((port == AR8X16_PORT_CPU) ||
 	    ((AR8X16_IS_SWITCH(sc, AR8327) &&
 	      port == AR8327_PORT_GMAC6)));
 }
 
 static int
 arswitch_getport(device_t dev, etherswitch_port_t *p)
 {
 	struct arswitch_softc *sc;
 	struct mii_data *mii;
 	struct ifmediareq *ifmr;
 	int err;
 
 	sc = device_get_softc(dev);
 	/* XXX +1 is for AR8327; should make this configurable! */
 	if (p->es_port < 0 || p->es_port > sc->info.es_nports)
 		return (ENXIO);
 
 	err = sc->hal.arswitch_port_vlan_get(sc, p);
 	if (err != 0)
 		return (err);
 
 	mii = arswitch_miiforport(sc, p->es_port);
 	if (arswitch_is_cpuport(sc, p->es_port)) {
 		/* fill in fixed values for CPU port */
 		/* XXX is this valid in all cases? */
 		p->es_flags |= ETHERSWITCH_PORT_CPU;
 		ifmr = &p->es_ifmr;
 		ifmr->ifm_count = 0;
 		ifmr->ifm_current = ifmr->ifm_active =
 		    IFM_ETHER | IFM_1000_T | IFM_FDX;
 		ifmr->ifm_mask = 0;
 		ifmr->ifm_status = IFM_ACTIVE | IFM_AVALID;
 	} else if (mii != NULL) {
 		err = ifmedia_ioctl(mii->mii_ifp, &p->es_ifr,
 		    &mii->mii_media, SIOCGIFMEDIA);
 		if (err)
 			return (err);
 	} else {
 		return (ENXIO);
 	}
+	
+	if (!arswitch_is_cpuport(sc, p->es_port) &&
+	    AR8X16_IS_SWITCH(sc, AR8327)) {
+		int led;
+		p->es_nleds = 3;
+
+		for (led = 0; led < p->es_nleds; led++)
+		{
+			int style;
+			uint32_t val;
+			
+			/* Find the right style enum for our pattern */
+			val = arswitch_readreg(dev,
+			    ar8327_led_mapping[p->es_port-1][led].reg);
+			val = (val>>ar8327_led_mapping[p->es_port-1][led].shift)&0x03;
+
+			for (style = 0; style < ETHERSWITCH_PORT_LED_MAX; style++)
+			{
+				if (led_pattern_table[style] == val) break;
+			}
+			
+			/* can't happen */
+			if (style == ETHERSWITCH_PORT_LED_MAX)
+				style = ETHERSWITCH_PORT_LED_DEFAULT;
+			
+			p->es_led[led] = style;
+		}
+	} else
+	{
+		p->es_nleds = 0;
+	}
+	
 	return (0);
 }
 
 static int
 ar8xxx_port_vlan_setup(struct arswitch_softc *sc, etherswitch_port_t *p)
 {
 	uint32_t reg;
 	int err;
 
 	ARSWITCH_LOCK(sc);
 
 	/* Set the PVID. */
 	if (p->es_pvid != 0)
 		sc->hal.arswitch_vlan_set_pvid(sc, p->es_port, p->es_pvid);
 
 	/* Mutually exclusive. */
 	if (p->es_flags & ETHERSWITCH_PORT_ADDTAG &&
 	    p->es_flags & ETHERSWITCH_PORT_STRIPTAG) {
 		ARSWITCH_UNLOCK(sc);
 		return (EINVAL);
 	}
 
 	reg = 0;
 	if (p->es_flags & ETHERSWITCH_PORT_DOUBLE_TAG)
 		reg |= AR8X16_PORT_CTRL_DOUBLE_TAG;
 	if (p->es_flags & ETHERSWITCH_PORT_ADDTAG)
 		reg |= AR8X16_PORT_CTRL_EGRESS_VLAN_MODE_ADD <<
 		    AR8X16_PORT_CTRL_EGRESS_VLAN_MODE_SHIFT;
 	if (p->es_flags & ETHERSWITCH_PORT_STRIPTAG)
 		reg |= AR8X16_PORT_CTRL_EGRESS_VLAN_MODE_STRIP <<
 		    AR8X16_PORT_CTRL_EGRESS_VLAN_MODE_SHIFT;
 
 	err = arswitch_modifyreg(sc->sc_dev,
 	    AR8X16_REG_PORT_CTRL(p->es_port),
 	    0x3 << AR8X16_PORT_CTRL_EGRESS_VLAN_MODE_SHIFT |
 	    AR8X16_PORT_CTRL_DOUBLE_TAG, reg);
 
 	ARSWITCH_UNLOCK(sc);
 	return (err);
 }
 
 static int
 arswitch_setport(device_t dev, etherswitch_port_t *p)
 {
-	int err;
+	int err, i;
 	struct arswitch_softc *sc;
 	struct ifmedia *ifm;
 	struct mii_data *mii;
 	struct ifnet *ifp;
 
 	sc = device_get_softc(dev);
 	if (p->es_port < 0 || p->es_port > sc->info.es_nports)
 		return (ENXIO);
 
 	/* Port flags. */
 	if (sc->vlan_mode == ETHERSWITCH_VLAN_DOT1Q) {
 		err = sc->hal.arswitch_port_vlan_setup(sc, p);
 		if (err)
 			return (err);
 	}
 
-	/* Do not allow media changes on CPU port. */
+	/* Do not allow media or led changes on CPU port. */
 	if (arswitch_is_cpuport(sc, p->es_port))
 		return (0);
+	
+	if (AR8X16_IS_SWITCH(sc, AR8327))
+	{
+		for (i = 0; i < 3; i++)
+		{	
+			int err;
+			err = arswitch_setled(sc, p->es_port-1, i, p->es_led[i]);
+			if (err)
+				return (err);
+		}
+	}
 
 	mii = arswitch_miiforport(sc, p->es_port);
 	if (mii == NULL)
 		return (ENXIO);
 
 	ifp = arswitch_ifpforport(sc, p->es_port);
 
 	ifm = &mii->mii_media;
 	return (ifmedia_ioctl(ifp, &p->es_ifr, ifm, SIOCSIFMEDIA));
+}
+
+static int
+arswitch_setled(struct arswitch_softc *sc, int phy, int led, int style)
+{
+	int shift;
+
+	if (phy < 0 || phy > sc->numphys)
+		return EINVAL;
+
+	if (style < 0 || style > ETHERSWITCH_PORT_LED_MAX)
+		return (EINVAL);
+
+	shift = ar8327_led_mapping[phy][led].shift;
+	return (arswitch_modifyreg(sc->sc_dev,
+	    ar8327_led_mapping[phy][led].reg,
+	    0x03 << shift, led_pattern_table[style] << shift));
 }
 
 static void
 arswitch_statchg(device_t dev)
 {
 
 	DPRINTF(dev, "%s\n", __func__);
 }
 
 static int
 arswitch_ifmedia_upd(struct ifnet *ifp)
 {
 	struct arswitch_softc *sc = ifp->if_softc;
 	struct mii_data *mii = arswitch_miiforport(sc, ifp->if_dunit);
 
 	if (mii == NULL)
 		return (ENXIO);
 	mii_mediachg(mii);
 	return (0);
 }
 
 static void
 arswitch_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr)
 {
 	struct arswitch_softc *sc = ifp->if_softc;
 	struct mii_data *mii = arswitch_miiforport(sc, ifp->if_dunit);
 
 	DPRINTF(sc->sc_dev, "%s\n", __func__);
 
 	if (mii == NULL)
 		return;
 	mii_pollstat(mii);
 	ifmr->ifm_active = mii->mii_media_active;
 	ifmr->ifm_status = mii->mii_media_status;
 }
 
 static int
 arswitch_getconf(device_t dev, etherswitch_conf_t *conf)
 {
 	struct arswitch_softc *sc;
 
 	sc = device_get_softc(dev);
 
 	/* Return the VLAN mode. */
 	conf->cmd = ETHERSWITCH_CONF_VLAN_MODE;
 	conf->vlan_mode = sc->vlan_mode;
 
 	return (0);
 }
 
 static int
 arswitch_setconf(device_t dev, etherswitch_conf_t *conf)
 {
 	struct arswitch_softc *sc;
 	int err;
 
 	sc = device_get_softc(dev);
 
 	/* Set the VLAN mode. */
 	if (conf->cmd & ETHERSWITCH_CONF_VLAN_MODE) {
 		err = arswitch_set_vlan_mode(sc, conf->vlan_mode);
 		if (err != 0)
 			return (err);
 	}
 
 	return (0);
 }
 
 static int
 arswitch_getvgroup(device_t dev, etherswitch_vlangroup_t *e)
 {
 	struct arswitch_softc *sc = device_get_softc(dev);
 
 	return (sc->hal.arswitch_vlan_getvgroup(sc, e));
 }
 
 static int
 arswitch_setvgroup(device_t dev, etherswitch_vlangroup_t *e)
 {
 	struct arswitch_softc *sc = device_get_softc(dev);
 
 	return (sc->hal.arswitch_vlan_setvgroup(sc, e));
 }
 
 static int
 arswitch_readphy(device_t dev, int phy, int reg)
 {
 	struct arswitch_softc *sc = device_get_softc(dev);
 
 	return (sc->hal.arswitch_phy_read(dev, phy, reg));
 }
 
 static int
 arswitch_writephy(device_t dev, int phy, int reg, int val)
 {
 	struct arswitch_softc *sc = device_get_softc(dev);
 
 	return (sc->hal.arswitch_phy_write(dev, phy, reg, val));
 }
 
 static device_method_t arswitch_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,		arswitch_probe),
 	DEVMETHOD(device_attach,	arswitch_attach),
 	DEVMETHOD(device_detach,	arswitch_detach),
 	
 	/* bus interface */
 	DEVMETHOD(bus_add_child,	device_add_child_ordered),
 	
 	/* MII interface */
 	DEVMETHOD(miibus_readreg,	arswitch_readphy),
 	DEVMETHOD(miibus_writereg,	arswitch_writephy),
 	DEVMETHOD(miibus_statchg,	arswitch_statchg),
 
 	/* MDIO interface */
 	DEVMETHOD(mdio_readreg,		arswitch_readphy),
 	DEVMETHOD(mdio_writereg,	arswitch_writephy),
 
 	/* etherswitch interface */
 	DEVMETHOD(etherswitch_lock,	arswitch_lock),
 	DEVMETHOD(etherswitch_unlock,	arswitch_unlock),
 	DEVMETHOD(etherswitch_getinfo,	arswitch_getinfo),
 	DEVMETHOD(etherswitch_readreg,	arswitch_readreg),
 	DEVMETHOD(etherswitch_writereg,	arswitch_writereg),
 	DEVMETHOD(etherswitch_readphyreg,	arswitch_readphy),
 	DEVMETHOD(etherswitch_writephyreg,	arswitch_writephy),
 	DEVMETHOD(etherswitch_getport,	arswitch_getport),
 	DEVMETHOD(etherswitch_setport,	arswitch_setport),
 	DEVMETHOD(etherswitch_getvgroup,	arswitch_getvgroup),
 	DEVMETHOD(etherswitch_setvgroup,	arswitch_setvgroup),
 	DEVMETHOD(etherswitch_getconf,	arswitch_getconf),
 	DEVMETHOD(etherswitch_setconf,	arswitch_setconf),
 
 	DEVMETHOD_END
 };
 
 DEFINE_CLASS_0(arswitch, arswitch_driver, arswitch_methods,
     sizeof(struct arswitch_softc));
 static devclass_t arswitch_devclass;
 
 DRIVER_MODULE(arswitch, mdio, arswitch_driver, arswitch_devclass, 0, 0);
 DRIVER_MODULE(miibus, arswitch, miibus_driver, miibus_devclass, 0, 0);
 DRIVER_MODULE(mdio, arswitch, mdio_driver, mdio_devclass, 0, 0);
 DRIVER_MODULE(etherswitch, arswitch, etherswitch_driver, etherswitch_devclass, 0, 0);
 MODULE_VERSION(arswitch, 1);
 MODULE_DEPEND(arswitch, miibus, 1, 1, 1); /* XXX which versions? */
 MODULE_DEPEND(arswitch, etherswitch, 1, 1, 1); /* XXX which versions? */
Index: user/alc/PQ_LAUNDRY/sys/dev/etherswitch/arswitch/arswitch_8327.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/dev/etherswitch/arswitch/arswitch_8327.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/dev/etherswitch/arswitch/arswitch_8327.c	(revision 303775)
@@ -1,1160 +1,1190 @@
 /*-
  * Copyright (c) 2011-2012 Stefan Bethke.
  * Copyright (c) 2014 Adrian Chadd.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #include <sys/param.h>
 #include <sys/bus.h>
 #include <sys/errno.h>
 #include <sys/kernel.h>
 #include <sys/module.h>
 #include <sys/socket.h>
 #include <sys/sockio.h>
 #include <sys/sysctl.h>
 #include <sys/systm.h>
 
 #include <net/if.h>
 #include <net/if_arp.h>
 #include <net/ethernet.h>
 #include <net/if_dl.h>
 #include <net/if_media.h>
 #include <net/if_types.h>
 
 #include <machine/bus.h>
 #include <dev/iicbus/iic.h>
 #include <dev/iicbus/iiconf.h>
 #include <dev/iicbus/iicbus.h>
 #include <dev/mii/mii.h>
 #include <dev/mii/miivar.h>
 #include <dev/mdio/mdio.h>
 
 #include <dev/etherswitch/etherswitch.h>
 
 #include <dev/etherswitch/arswitch/arswitchreg.h>
 #include <dev/etherswitch/arswitch/arswitchvar.h>
 #include <dev/etherswitch/arswitch/arswitch_reg.h>
 #include <dev/etherswitch/arswitch/arswitch_phy.h>
 #include <dev/etherswitch/arswitch/arswitch_vlans.h>
 
 #include <dev/etherswitch/arswitch/arswitch_8327.h>
 
 #include "mdio_if.h"
 #include "miibus_if.h"
 #include "etherswitch_if.h"
 
 /*
  * AR8327 TODO:
  *
  * There should be a default hardware setup hint set for the default
  * switch config.  Otherwise the default is "all ports in one vlangroup",
  * which means both CPU ports can see each other and that will quickly
  * lead to traffic storms/loops.
  */
 
+/* Map port+led to register+shift */
+struct ar8327_led_mapping ar8327_led_mapping[AR8327_NUM_PHYS][ETHERSWITCH_PORT_MAX_LEDS] =
+{
+	{	/* PHY0 */
+		{AR8327_REG_LED_CTRL0, 14 },
+		{AR8327_REG_LED_CTRL1, 14 },
+		{AR8327_REG_LED_CTRL2, 14 }
+	},
+	{	/* PHY1 */
+		{AR8327_REG_LED_CTRL3, 8  },
+		{AR8327_REG_LED_CTRL3, 10 },
+		{AR8327_REG_LED_CTRL3, 12 }
+	},
+	{	/* PHY2 */
+		{AR8327_REG_LED_CTRL3, 14 },
+		{AR8327_REG_LED_CTRL3, 16 },
+		{AR8327_REG_LED_CTRL3, 18 }
+	},
+	{	/* PHY3 */
+		{AR8327_REG_LED_CTRL3, 20 },
+		{AR8327_REG_LED_CTRL3, 22 },
+		{AR8327_REG_LED_CTRL3, 24 }
+	},
+	{	/* PHY4 */
+		{AR8327_REG_LED_CTRL0, 30 },
+		{AR8327_REG_LED_CTRL1, 30 },
+		{AR8327_REG_LED_CTRL2, 30 }
+	}
+};
+
 static int
 ar8327_vlan_op(struct arswitch_softc *sc, uint32_t op, uint32_t vid,
     uint32_t data)
 {
 	int err;
 
 	/*
 	 * Wait for the "done" bit to finish.
 	 */
 	if (arswitch_waitreg(sc->sc_dev, AR8327_REG_VTU_FUNC1,
 	    AR8327_VTU_FUNC1_BUSY, 0, 5))
 		return (EBUSY);
 
 	/*
 	 * If it's a "load" operation, then ensure 'data' is loaded
 	 * in first.
 	 */
 	if ((op & AR8327_VTU_FUNC1_OP) == AR8327_VTU_FUNC1_OP_LOAD) {
 		err = arswitch_writereg(sc->sc_dev, AR8327_REG_VTU_FUNC0, data);
 		if (err)
 			return (err);
 	}
 
 	/*
 	 * Set the VID.
 	 */
 	op |= ((vid & 0xfff) << AR8327_VTU_FUNC1_VID_S);
 
 	/*
 	 * Set busy bit to start loading in the command.
 	 */
 	op |= AR8327_VTU_FUNC1_BUSY;
 	arswitch_writereg(sc->sc_dev, AR8327_REG_VTU_FUNC1, op);
 
 	/*
 	 * Finally - wait for it to load.
 	 */
 	if (arswitch_waitreg(sc->sc_dev, AR8327_REG_VTU_FUNC1,
 	    AR8327_VTU_FUNC1_BUSY, 0, 5))
 		return (EBUSY);
 
 	return (0);
 }
 
 static void
 ar8327_phy_fixup(struct arswitch_softc *sc, int phy)
 {
 	if (bootverbose)
 		device_printf(sc->sc_dev,
 		    "%s: called; phy=%d; chiprev=%d\n", __func__,
 		    phy,
 		    sc->chip_rev);
 	switch (sc->chip_rev) {
 	case 1:
 		/* For 100M waveform */
 		arswitch_writedbg(sc->sc_dev, phy, 0, 0x02ea);
 		/* Turn on Gigabit clock */
 		arswitch_writedbg(sc->sc_dev, phy, 0x3d, 0x68a0);
 		break;
 
 	case 2:
 		arswitch_writemmd(sc->sc_dev, phy, 0x7, 0x3c);
 		arswitch_writemmd(sc->sc_dev, phy, 0x4007, 0x0);
 		/* fallthrough */
 	case 4:
 		arswitch_writemmd(sc->sc_dev, phy, 0x3, 0x800d);
 		arswitch_writemmd(sc->sc_dev, phy, 0x4003, 0x803f);
 
 		arswitch_writedbg(sc->sc_dev, phy, 0x3d, 0x6860);
 		arswitch_writedbg(sc->sc_dev, phy, 0x5, 0x2c46);
 		arswitch_writedbg(sc->sc_dev, phy, 0x3c, 0x6000);
 		break;
 	}
 }
 
 static uint32_t
 ar8327_get_pad_cfg(struct ar8327_pad_cfg *cfg)
 {
 	uint32_t t;
 
 	if (!cfg)
 		return (0);
 
 	t = 0;
 	switch (cfg->mode) {
 	case AR8327_PAD_NC:
 		break;
 
 	case AR8327_PAD_MAC2MAC_MII:
 		t = AR8327_PAD_MAC_MII_EN;
 		if (cfg->rxclk_sel)
 			t |= AR8327_PAD_MAC_MII_RXCLK_SEL;
 		if (cfg->txclk_sel)
 			t |= AR8327_PAD_MAC_MII_TXCLK_SEL;
 		break;
 
 	case AR8327_PAD_MAC2MAC_GMII:
 		t = AR8327_PAD_MAC_GMII_EN;
 		if (cfg->rxclk_sel)
 			t |= AR8327_PAD_MAC_GMII_RXCLK_SEL;
 		if (cfg->txclk_sel)
 			t |= AR8327_PAD_MAC_GMII_TXCLK_SEL;
 		break;
 
 	case AR8327_PAD_MAC_SGMII:
 		t = AR8327_PAD_SGMII_EN;
 
 		/*
 		 * WAR for the Qualcomm Atheros AP136 board.
 		 * It seems that RGMII TX/RX delay settings needs to be
 		 * applied for SGMII mode as well, The ethernet is not
 		 * reliable without this.
 		 */
 		t |= cfg->txclk_delay_sel << AR8327_PAD_RGMII_TXCLK_DELAY_SEL_S;
 		t |= cfg->rxclk_delay_sel << AR8327_PAD_RGMII_RXCLK_DELAY_SEL_S;
 		if (cfg->rxclk_delay_en)
 			t |= AR8327_PAD_RGMII_RXCLK_DELAY_EN;
 		if (cfg->txclk_delay_en)
 			t |= AR8327_PAD_RGMII_TXCLK_DELAY_EN;
 
 		if (cfg->sgmii_delay_en)
 			t |= AR8327_PAD_SGMII_DELAY_EN;
 
 		break;
 
 	case AR8327_PAD_MAC2PHY_MII:
 		t = AR8327_PAD_PHY_MII_EN;
 		if (cfg->rxclk_sel)
 			t |= AR8327_PAD_PHY_MII_RXCLK_SEL;
 		if (cfg->txclk_sel)
 			t |= AR8327_PAD_PHY_MII_TXCLK_SEL;
 		break;
 
 	case AR8327_PAD_MAC2PHY_GMII:
 		t = AR8327_PAD_PHY_GMII_EN;
 		if (cfg->pipe_rxclk_sel)
 			t |= AR8327_PAD_PHY_GMII_PIPE_RXCLK_SEL;
 		if (cfg->rxclk_sel)
 			t |= AR8327_PAD_PHY_GMII_RXCLK_SEL;
 		if (cfg->txclk_sel)
 			t |= AR8327_PAD_PHY_GMII_TXCLK_SEL;
 		break;
 
 	case AR8327_PAD_MAC_RGMII:
 		t = AR8327_PAD_RGMII_EN;
 		t |= cfg->txclk_delay_sel << AR8327_PAD_RGMII_TXCLK_DELAY_SEL_S;
 		t |= cfg->rxclk_delay_sel << AR8327_PAD_RGMII_RXCLK_DELAY_SEL_S;
 		if (cfg->rxclk_delay_en)
 			t |= AR8327_PAD_RGMII_RXCLK_DELAY_EN;
 		if (cfg->txclk_delay_en)
 			t |= AR8327_PAD_RGMII_TXCLK_DELAY_EN;
 		break;
 
 	case AR8327_PAD_PHY_GMII:
 		t = AR8327_PAD_PHYX_GMII_EN;
 		break;
 
 	case AR8327_PAD_PHY_RGMII:
 		t = AR8327_PAD_PHYX_RGMII_EN;
 		break;
 
 	case AR8327_PAD_PHY_MII:
 		t = AR8327_PAD_PHYX_MII_EN;
 		break;
 	}
 
 	return (t);
 }
 
 /*
  * Map the hard-coded port config from the switch setup to
  * the chipset port config (status, duplex, flow, etc.)
  */
 static uint32_t
 ar8327_get_port_init_status(struct ar8327_port_cfg *cfg)
 {
 	uint32_t t;
 
 	if (!cfg->force_link)
 		return (AR8X16_PORT_STS_LINK_AUTO);
 
 	t = AR8X16_PORT_STS_TXMAC | AR8X16_PORT_STS_RXMAC;
 	t |= cfg->duplex ? AR8X16_PORT_STS_DUPLEX : 0;
 	t |= cfg->rxpause ? AR8X16_PORT_STS_RXFLOW : 0;
 	t |= cfg->txpause ? AR8X16_PORT_STS_TXFLOW : 0;
 
 	switch (cfg->speed) {
 	case AR8327_PORT_SPEED_10:
 		t |= AR8X16_PORT_STS_SPEED_10;
 		break;
 	case AR8327_PORT_SPEED_100:
 		t |= AR8X16_PORT_STS_SPEED_100;
 		break;
 	case AR8327_PORT_SPEED_1000:
 		t |= AR8X16_PORT_STS_SPEED_1000;
 		break;
 	}
 
 	return (t);
 }
 
 /*
  * Fetch the port data for the given port.
  *
  * This goes and does dirty things with the hints space
  * to determine what the configuration parameters should be.
  *
  * Returns 1 if the structure was successfully parsed and
  * the contents are valid; 0 otherwise.
  */
 static int
 ar8327_fetch_pdata_port(struct arswitch_softc *sc,
     struct ar8327_port_cfg *pcfg,
     int port)
 {
 	int val;
 	char sbuf[128];
 
 	/* Check if force_link exists */
 	val = 0;
 	snprintf(sbuf, 128, "port.%d.force_link", port);
 	(void) resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val);
 	if (val != 1)
 		return (0);
 	pcfg->force_link = 1;
 
 	/* force_link is set; let's parse the rest of the fields */
 	snprintf(sbuf, 128, "port.%d.speed", port);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) == 0) {
 		switch (val) {
 		case 10:
 			pcfg->speed = AR8327_PORT_SPEED_10;
 			break;
 		case 100:
 			pcfg->speed = AR8327_PORT_SPEED_100;
 			break;
 		case 1000:
 			pcfg->speed = AR8327_PORT_SPEED_1000;
 			break;
 		default:
 			device_printf(sc->sc_dev,
 			    "%s: invalid port %d duplex value (%d)\n",
 			    __func__,
 			    port,
 			    val);
 			return (0);
 		}
 	}
 
 	snprintf(sbuf, 128, "port.%d.duplex", port);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) == 0)
 		pcfg->duplex = val;
 
 	snprintf(sbuf, 128, "port.%d.txpause", port);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) == 0)
 		pcfg->txpause = val;
 
 	snprintf(sbuf, 128, "port.%d.rxpause", port);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) == 0)
 		pcfg->rxpause = val;
 
 #if 1
 	device_printf(sc->sc_dev,
 	    "%s: port %d: speed=%d, duplex=%d, txpause=%d, rxpause=%d\n",
 	    __func__,
 	    port,
 	    pcfg->speed,
 	    pcfg->duplex,
 	    pcfg->txpause,
 	    pcfg->rxpause);
 #endif
 
 	return (1);
 }
 
 /*
  * Parse the pad configuration from the boot hints.
  *
  * The (mostly optional) fields are:
  *
  * uint32_t mode;
  * uint32_t rxclk_sel;
  * uint32_t txclk_sel;
  * uint32_t txclk_delay_sel;
  * uint32_t rxclk_delay_sel;
  * uint32_t txclk_delay_en;
  * uint32_t rxclk_delay_en;
  * uint32_t sgmii_delay_en;
  * uint32_t pipe_rxclk_sel;
  *
  * If mode isn't in the hints, 0 is returned.
  * Else the structure is fleshed out and 1 is returned.
  */
 static int
 ar8327_fetch_pdata_pad(struct arswitch_softc *sc,
     struct ar8327_pad_cfg *pc,
     int pad)
 {
 	int val;
 	char sbuf[128];
 
 	/* Check if mode exists */
 	val = 0;
 	snprintf(sbuf, 128, "pad.%d.mode", pad);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) != 0)
 		return (0);
 
 	/* assume that 'mode' exists and was found */
 	pc->mode = val;
 
 	snprintf(sbuf, 128, "pad.%d.rxclk_sel", pad);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) == 0)
 		pc->rxclk_sel = val;
 
 	snprintf(sbuf, 128, "pad.%d.txclk_sel", pad);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) == 0)
 		pc->txclk_sel = val;
 
 	snprintf(sbuf, 128, "pad.%d.txclk_delay_sel", pad);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) == 0)
 		pc->txclk_delay_sel = val;
 
 	snprintf(sbuf, 128, "pad.%d.rxclk_delay_sel", pad);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) == 0)
 		pc->rxclk_delay_sel = val;
 
 	snprintf(sbuf, 128, "pad.%d.txclk_delay_en", pad);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) == 0)
 		pc->txclk_delay_en = val;
 
 	snprintf(sbuf, 128, "pad.%d.rxclk_delay_en", pad);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) == 0)
 		pc->rxclk_delay_en = val;
 
 	snprintf(sbuf, 128, "pad.%d.sgmii_delay_en", pad);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) == 0)
 		pc->sgmii_delay_en = val;
 
 	snprintf(sbuf, 128, "pad.%d.pipe_rxclk_sel", pad);
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    sbuf, &val) == 0)
 		pc->pipe_rxclk_sel = val;
 
 	if (bootverbose) {
 		device_printf(sc->sc_dev,
 		    "%s: pad %d: mode=%d, rxclk_sel=%d, txclk_sel=%d, "
 		    "txclk_delay_sel=%d, rxclk_delay_sel=%d, txclk_delay_en=%d, "
 		    "rxclk_enable_en=%d, sgmii_delay_en=%d, pipe_rxclk_sel=%d\n",
 		    __func__,
 		    pad,
 		    pc->mode,
 		    pc->rxclk_sel,
 		    pc->txclk_sel,
 		    pc->txclk_delay_sel,
 		    pc->rxclk_delay_sel,
 		    pc->txclk_delay_en,
 		    pc->rxclk_delay_en,
 		    pc->sgmii_delay_en,
 		    pc->pipe_rxclk_sel);
 	}
 
 	return (1);
 }
 
 /*
  * Fetch the SGMII configuration block from the boot hints.
  */
 static int
 ar8327_fetch_pdata_sgmii(struct arswitch_softc *sc,
     struct ar8327_sgmii_cfg *scfg)
 {
 	int val;
 
 	/* sgmii_ctrl */
 	val = 0;
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    "sgmii.ctrl", &val) != 0)
 		return (0);
 	scfg->sgmii_ctrl = val;
 
 	/* serdes_aen */
 	val = 0;
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    "sgmii.serdes_aen", &val) != 0)
 		return (0);
 	scfg->serdes_aen = val;
 
 	return (1);
 }
 
 /*
  * Fetch the LED configuration from the boot hints.
  */
 static int
 ar8327_fetch_pdata_led(struct arswitch_softc *sc,
     struct ar8327_led_cfg *lcfg)
 {
 	int val;
 
 	val = 0;
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    "led.ctrl0", &val) != 0)
 		return (0);
 	lcfg->led_ctrl0 = val;
 
 	val = 0;
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    "led.ctrl1", &val) != 0)
 		return (0);
 	lcfg->led_ctrl1 = val;
 
 	val = 0;
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    "led.ctrl2", &val) != 0)
 		return (0);
 	lcfg->led_ctrl2 = val;
 
 	val = 0;
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    "led.ctrl3", &val) != 0)
 		return (0);
 	lcfg->led_ctrl3 = val;
 
 	val = 0;
 	if (resource_int_value(device_get_name(sc->sc_dev),
 	    device_get_unit(sc->sc_dev),
 	    "led.open_drain", &val) != 0)
 		return (0);
 	lcfg->open_drain = val;
 
 	return (1);
 }
 
 /*
  * Initialise the ar8327 specific hardware features from
  * the hints provided in the boot environment.
  */
 static int
 ar8327_init_pdata(struct arswitch_softc *sc)
 {
 	struct ar8327_pad_cfg pc;
 	struct ar8327_port_cfg port_cfg;
 	struct ar8327_sgmii_cfg scfg;
 	struct ar8327_led_cfg lcfg;
 	uint32_t t, new_pos, pos;
 
 	/* Port 0 */
 	bzero(&port_cfg, sizeof(port_cfg));
 	sc->ar8327.port0_status = 0;
 	if (ar8327_fetch_pdata_port(sc, &port_cfg, 0))
 		sc->ar8327.port0_status = ar8327_get_port_init_status(&port_cfg);
 
 	/* Port 6 */
 	bzero(&port_cfg, sizeof(port_cfg));
 	sc->ar8327.port6_status = 0;
 	if (ar8327_fetch_pdata_port(sc, &port_cfg, 6))
 		sc->ar8327.port6_status = ar8327_get_port_init_status(&port_cfg);
 
 	/* Pad 0 */
 	bzero(&pc, sizeof(pc));
 	t = 0;
 	if (ar8327_fetch_pdata_pad(sc, &pc, 0))
 		t = ar8327_get_pad_cfg(&pc);
 #if 0
 		if (AR8X16_IS_SWITCH(sc, AR8337))
 			t |= AR8337_PAD_MAC06_EXCHANGE_EN;
 #endif
 	arswitch_writereg(sc->sc_dev, AR8327_REG_PAD0_MODE, t);
 
 	/* Pad 5 */
 	bzero(&pc, sizeof(pc));
 	t = 0;
 	if (ar8327_fetch_pdata_pad(sc, &pc, 5))
 		t = ar8327_get_pad_cfg(&pc);
 	arswitch_writereg(sc->sc_dev, AR8327_REG_PAD5_MODE, t);
 
 	/* Pad 6 */
 	bzero(&pc, sizeof(pc));
 	t = 0;
 	if (ar8327_fetch_pdata_pad(sc, &pc, 6))
 		t = ar8327_get_pad_cfg(&pc);
 	arswitch_writereg(sc->sc_dev, AR8327_REG_PAD6_MODE, t);
 
 	pos = arswitch_readreg(sc->sc_dev, AR8327_REG_POWER_ON_STRIP);
 	new_pos = pos;
 
 	/* XXX LED config */
 	bzero(&lcfg, sizeof(lcfg));
 	if (ar8327_fetch_pdata_led(sc, &lcfg)) {
 		if (lcfg.open_drain)
 			new_pos |= AR8327_POWER_ON_STRIP_LED_OPEN_EN;
 		else
 			new_pos &= ~AR8327_POWER_ON_STRIP_LED_OPEN_EN;
 
 		arswitch_writereg(sc->sc_dev, AR8327_REG_LED_CTRL0,
 		    lcfg.led_ctrl0);
 		arswitch_writereg(sc->sc_dev, AR8327_REG_LED_CTRL1,
 		    lcfg.led_ctrl1);
 		arswitch_writereg(sc->sc_dev, AR8327_REG_LED_CTRL2,
 		    lcfg.led_ctrl2);
 		arswitch_writereg(sc->sc_dev, AR8327_REG_LED_CTRL3,
 		    lcfg.led_ctrl3);
 
 		if (new_pos != pos)
 			new_pos |= AR8327_POWER_ON_STRIP_POWER_ON_SEL;
 	}
 
 	/* SGMII config */
 	bzero(&scfg, sizeof(scfg));
 	if (ar8327_fetch_pdata_sgmii(sc, &scfg)) {
 		device_printf(sc->sc_dev, "%s: SGMII cfg?\n", __func__);
 		t = scfg.sgmii_ctrl;
 		if (sc->chip_rev == 1)
 			t |= AR8327_SGMII_CTRL_EN_PLL |
 			    AR8327_SGMII_CTRL_EN_RX |
 			    AR8327_SGMII_CTRL_EN_TX;
 		else
 			t &= ~(AR8327_SGMII_CTRL_EN_PLL |
 			    AR8327_SGMII_CTRL_EN_RX |
 			    AR8327_SGMII_CTRL_EN_TX);
 
 		arswitch_writereg(sc->sc_dev, AR8327_REG_SGMII_CTRL, t);
 
 		if (scfg.serdes_aen)
 			new_pos &= ~AR8327_POWER_ON_STRIP_SERDES_AEN;
 		else
 			new_pos |= AR8327_POWER_ON_STRIP_SERDES_AEN;
 	}
 
 	arswitch_writereg(sc->sc_dev, AR8327_REG_POWER_ON_STRIP, new_pos);
 
 	return (0);
 }
 
 static int
 ar8327_hw_setup(struct arswitch_softc *sc)
 {
 	int i;
 	int err;
 
 	/* pdata fetch and setup */
 	err = ar8327_init_pdata(sc);
 	if (err != 0)
 		return (err);
 
 	/* XXX init leds */
 
 	for (i = 0; i < AR8327_NUM_PHYS; i++) {
 		/* phy fixup */
 		ar8327_phy_fixup(sc, i);
 
 		/* start PHY autonegotiation? */
 		/* XXX is this done as part of the normal PHY setup? */
 
 	}
 
 	/* Let things settle */
 	DELAY(1000);
 
 	return (0);
 }
 
 /*
  * Initialise other global values, for the AR8327.
  */
 static int
 ar8327_hw_global_setup(struct arswitch_softc *sc)
 {
 	uint32_t t;
 
 	/* enable CPU port and disable mirror port */
 	t = AR8327_FWD_CTRL0_CPU_PORT_EN |
 	    AR8327_FWD_CTRL0_MIRROR_PORT;
 	arswitch_writereg(sc->sc_dev, AR8327_REG_FWD_CTRL0, t);
 
 	/* forward multicast and broadcast frames to CPU */
 	t = (AR8327_PORTS_ALL << AR8327_FWD_CTRL1_UC_FLOOD_S) |
 	    (AR8327_PORTS_ALL << AR8327_FWD_CTRL1_MC_FLOOD_S) |
 	    (AR8327_PORTS_ALL << AR8327_FWD_CTRL1_BC_FLOOD_S);
 	arswitch_writereg(sc->sc_dev, AR8327_REG_FWD_CTRL1, t);
 
 	/* enable jumbo frames */
 	/* XXX need to macro-shift the value! */
 	arswitch_modifyreg(sc->sc_dev, AR8327_REG_MAX_FRAME_SIZE,
 	    AR8327_MAX_FRAME_SIZE_MTU, 9018 + 8 + 2);
 
 	/* Enable MIB counters */
 	arswitch_modifyreg(sc->sc_dev, AR8327_REG_MODULE_EN,
 	    AR8327_MODULE_EN_MIB, AR8327_MODULE_EN_MIB);
 
 	/* Disable EEE on all ports due to stability issues */
 	t = arswitch_readreg(sc->sc_dev, AR8327_REG_EEE_CTRL);
 	t |= AR8327_EEE_CTRL_DISABLE_PHY(0) |
 	    AR8327_EEE_CTRL_DISABLE_PHY(1) |
 	    AR8327_EEE_CTRL_DISABLE_PHY(2) |
 	    AR8327_EEE_CTRL_DISABLE_PHY(3) |
 	    AR8327_EEE_CTRL_DISABLE_PHY(4);
 	arswitch_writereg(sc->sc_dev, AR8327_REG_EEE_CTRL, t);
 
 	/* Set the right number of ports */
 	/* GMAC0 (CPU), GMAC1..5 (PHYs), GMAC6 (CPU) */
 	sc->info.es_nports = 7;
 
 	return (0);
 }
 
 /*
  * Port setup.  Called at attach time.
  */
 static void
 ar8327_port_init(struct arswitch_softc *sc, int port)
 {
 	uint32_t t;
 	int ports;
 
 	/* For now, port can see all other ports */
 	ports = 0x7f;
 
 	if (port == AR8X16_PORT_CPU)
 		t = sc->ar8327.port0_status;
 	else if (port == 6)
 		t = sc->ar8327.port6_status;
         else
 		t = AR8X16_PORT_STS_LINK_AUTO;
 
 	arswitch_writereg(sc->sc_dev, AR8327_REG_PORT_STATUS(port), t);
 	arswitch_writereg(sc->sc_dev, AR8327_REG_PORT_HEADER(port), 0);
 
 	/*
 	 * Default to 1 port group.
 	 */
 	t = 1 << AR8327_PORT_VLAN0_DEF_SVID_S;
 	t |= 1 << AR8327_PORT_VLAN0_DEF_CVID_S;
 	arswitch_writereg(sc->sc_dev, AR8327_REG_PORT_VLAN0(port), t);
 
 	t = AR8327_PORT_VLAN1_OUT_MODE_UNTOUCH << AR8327_PORT_VLAN1_OUT_MODE_S;
 	arswitch_writereg(sc->sc_dev, AR8327_REG_PORT_VLAN1(port), t);
 
 	/*
 	 * This doesn't configure any ports which this port can "see".
 	 * bits 0-6 control which ports a frame coming into this port
 	 * can be sent out to.
 	 *
 	 * So by doing this, we're making it impossible to send frames out
 	 * to that port.
 	 */
 	t = AR8327_PORT_LOOKUP_LEARN;
 	t |= AR8X16_PORT_CTRL_STATE_FORWARD << AR8327_PORT_LOOKUP_STATE_S;
 
 	/* So this allows traffic to any port except ourselves */
 	t |= (ports & ~(1 << port));
 	arswitch_writereg(sc->sc_dev, AR8327_REG_PORT_LOOKUP(port), t);
 }
 
 static int
 ar8327_port_vlan_setup(struct arswitch_softc *sc, etherswitch_port_t *p)
 {
 
 	/* Check: ADDTAG/STRIPTAG - exclusive */
 
 	ARSWITCH_LOCK(sc);
 
 	/* Set the PVID. */
 	if (p->es_pvid != 0)
 		sc->hal.arswitch_vlan_set_pvid(sc, p->es_port, p->es_pvid);
 
 	/*
 	 * DOUBLE_TAG
 	 * VLAN_MODE_ADD
 	 * VLAN_MODE_STRIP
 	 */
 	ARSWITCH_UNLOCK(sc);
 	return (0);
 }
 
 /*
  * Get the port VLAN configuration.
  */
 static int
 ar8327_port_vlan_get(struct arswitch_softc *sc, etherswitch_port_t *p)
 {
 
 	ARSWITCH_LOCK(sc);
 
 	/* Retrieve the PVID */
 	sc->hal.arswitch_vlan_get_pvid(sc, p->es_port, &p->es_pvid);
 
 	/* Retrieve the current port configuration from the VTU */
 	/*
 	 * DOUBLE_TAG
 	 * VLAN_MODE_ADD
 	 * VLAN_MODE_STRIP
 	 */
 
 	ARSWITCH_UNLOCK(sc);
 	return (0);
 }
 
 static void
 ar8327_port_disable_mirror(struct arswitch_softc *sc, int port)
 {
 
 	arswitch_modifyreg(sc->sc_dev,
 	    AR8327_REG_PORT_LOOKUP(port),
 	    AR8327_PORT_LOOKUP_ING_MIRROR_EN,
 	    0);
 	arswitch_modifyreg(sc->sc_dev,
 	    AR8327_REG_PORT_HOL_CTRL1(port),
 	    AR8327_PORT_HOL_CTRL1_EG_MIRROR_EN,
 	    0);
 }
 
 static void
 ar8327_reset_vlans(struct arswitch_softc *sc)
 {
 	int i;
 	uint32_t t;
 	int ports;
 
 	ARSWITCH_LOCK_ASSERT(sc, MA_NOTOWNED);
 	ARSWITCH_LOCK(sc);
 
 	/* Clear the existing VLAN configuration */
 	memset(sc->vid, 0, sizeof(sc->vid));
 
 	/*
 	 * Disable mirroring.
 	 */
 	arswitch_modifyreg(sc->sc_dev, AR8327_REG_FWD_CTRL0,
 	    AR8327_FWD_CTRL0_MIRROR_PORT,
 	    (0xF << AR8327_FWD_CTRL0_MIRROR_PORT_S));
 
 	/*
 	 * XXX TODO: disable any Q-in-Q port configuration,
 	 * tagging, egress filters, etc.
 	 */
 
 	/*
 	 * For now, let's default to one portgroup, just so traffic
 	 * flows.  All ports can see other ports. There are two CPU GMACs
 	 * (GMAC0, GMAC6), GMAC1..GMAC5 are external PHYs.
 	 *
 	 * (ETHERSWITCH_VLAN_PORT)
 	 */
 	ports = 0x7f;
 
 	/*
 	 * XXX TODO: set things up correctly for vlans!
 	 */
 	for (i = 0; i < AR8327_NUM_PORTS; i++) {
 		int egress, ingress;
 
 		if (sc->vlan_mode == ETHERSWITCH_VLAN_PORT) {
 			sc->vid[i] = i | ETHERSWITCH_VID_VALID;
 			/* set egress == out_keep */
 			ingress = AR8X16_PORT_VLAN_MODE_PORT_ONLY;
 			/* in_port_only, forward */
 			egress = AR8327_PORT_VLAN1_OUT_MODE_UNTOUCH;
 		} else if (sc->vlan_mode == ETHERSWITCH_VLAN_DOT1Q) {
 			ingress = AR8X16_PORT_VLAN_MODE_SECURE;
 			egress = AR8327_PORT_VLAN1_OUT_MODE_UNMOD;
 		} else {
 			/* set egress == out_keep */
 			ingress = AR8X16_PORT_VLAN_MODE_PORT_ONLY;
 			/* in_port_only, forward */
 			egress = AR8327_PORT_VLAN1_OUT_MODE_UNTOUCH;
 		}
 
 		/* set pvid = 1; there's only one vlangroup to start with */
 		t = 1 << AR8327_PORT_VLAN0_DEF_SVID_S;
 		t |= 1 << AR8327_PORT_VLAN0_DEF_CVID_S;
 		arswitch_writereg(sc->sc_dev, AR8327_REG_PORT_VLAN0(i), t);
 
 		t = AR8327_PORT_VLAN1_PORT_VLAN_PROP;
 		t |= egress << AR8327_PORT_VLAN1_OUT_MODE_S;
 		arswitch_writereg(sc->sc_dev, AR8327_REG_PORT_VLAN1(i), t);
 
 		/* Ports can see other ports */
 		/* XXX not entirely true for dot1q? */
 		t = (ports & ~(1 << i));	/* all ports besides us */
 		t |= AR8327_PORT_LOOKUP_LEARN;
 
 		t |= ingress << AR8327_PORT_LOOKUP_IN_MODE_S;
 		t |= AR8X16_PORT_CTRL_STATE_FORWARD << AR8327_PORT_LOOKUP_STATE_S;
 		arswitch_writereg(sc->sc_dev, AR8327_REG_PORT_LOOKUP(i), t);
 	}
 
 	/*
 	 * Disable port mirroring entirely.
 	 */
 	for (i = 0; i < AR8327_NUM_PORTS; i++) {
 		ar8327_port_disable_mirror(sc, i);
 	}
 
 	/*
 	 * If dot1q - set pvid; dot1q, etc.
 	 */
 	if (sc->vlan_mode == ETHERSWITCH_VLAN_DOT1Q) {
 		sc->vid[0] = 1;
 		for (i = 0; i < AR8327_NUM_PORTS; i++) {
 			/* Each port - pvid 1 */
 			sc->hal.arswitch_vlan_set_pvid(sc, i, sc->vid[0]);
 		}
 		/* Initialise vlan1 - all ports, untagged */
 		sc->hal.arswitch_set_dot1q_vlan(sc, ports, ports, sc->vid[0]);
 		sc->vid[0] |= ETHERSWITCH_VID_VALID;
 	}
 
 	ARSWITCH_UNLOCK(sc);
 }
 
 static int
 ar8327_vlan_get_port(struct arswitch_softc *sc, uint32_t *ports, int vid)
 {
 	int port;
 	uint32_t reg;
 
 	ARSWITCH_LOCK_ASSERT(sc, MA_OWNED);
 
 	/* For port based vlans the vlanid is the same as the port index. */
 	port = vid & ETHERSWITCH_VID_MASK;
 	reg = arswitch_readreg(sc->sc_dev, AR8327_REG_PORT_LOOKUP(port));
 	*ports = reg & 0x7f;
 	return (0);
 }
 
 static int
 ar8327_vlan_set_port(struct arswitch_softc *sc, uint32_t ports, int vid)
 {
 	int err, port;
 
 	ARSWITCH_LOCK_ASSERT(sc, MA_OWNED);
 
 	/* For port based vlans the vlanid is the same as the port index. */
 	port = vid & ETHERSWITCH_VID_MASK;
 
 	err = arswitch_modifyreg(sc->sc_dev, AR8327_REG_PORT_LOOKUP(port),
 	    0x7f, /* vlan membership mask */
 	    (ports & 0x7f));
 
 	if (err)
 		return (err);
 	return (0);
 }
 
 static int
 ar8327_vlan_getvgroup(struct arswitch_softc *sc, etherswitch_vlangroup_t *vg)
 {
 
 	return (ar8xxx_getvgroup(sc, vg));
 }
 
 static int
 ar8327_vlan_setvgroup(struct arswitch_softc *sc, etherswitch_vlangroup_t *vg)
 {
 
 	return (ar8xxx_setvgroup(sc, vg));
 }
 
 static int
 ar8327_get_pvid(struct arswitch_softc *sc, int port, int *pvid)
 {
 	uint32_t reg;
 
 	ARSWITCH_LOCK_ASSERT(sc, MA_OWNED);
 
 	/*
 	 * XXX for now, assuming it's CVID; likely very wrong!
 	 */
 	port = port & ETHERSWITCH_VID_MASK;
 	reg = arswitch_readreg(sc->sc_dev, AR8327_REG_PORT_VLAN0(port));
 	reg = reg >> AR8327_PORT_VLAN0_DEF_CVID_S;
 	reg = reg & 0xfff;
 
 	*pvid = reg;
 	return (0);
 }
 
 static int
 ar8327_set_pvid(struct arswitch_softc *sc, int port, int pvid)
 {
 	uint32_t t;
 
 	/* Limit pvid to valid values */
 	pvid &= 0x7f;
 
 	t = pvid << AR8327_PORT_VLAN0_DEF_SVID_S;
 	t |= pvid << AR8327_PORT_VLAN0_DEF_CVID_S;
 	arswitch_writereg(sc->sc_dev, AR8327_REG_PORT_VLAN0(port), t);
 
 	return (0);
 }
 
 static int
 ar8327_atu_flush(struct arswitch_softc *sc)
 {
 
 	int ret;
 
 	ret = arswitch_waitreg(sc->sc_dev,
 	    AR8327_REG_ATU_FUNC,
 	    AR8327_ATU_FUNC_BUSY,
 	    0,
 	    1000);
 
 	if (ret)
 		device_printf(sc->sc_dev, "%s: waitreg failed\n", __func__);
 
 	if (!ret)
 		arswitch_writereg(sc->sc_dev,
 		    AR8327_REG_ATU_FUNC,
 		    AR8327_ATU_FUNC_OP_FLUSH);
 	return (ret);
 }
 
 static int
 ar8327_flush_dot1q_vlan(struct arswitch_softc *sc)
 {
 
 	return (ar8327_vlan_op(sc, AR8327_VTU_FUNC1_OP_FLUSH, 0, 0));
 }
 
 static int
 ar8327_purge_dot1q_vlan(struct arswitch_softc *sc, int vid)
 {
 
 	return (ar8327_vlan_op(sc, AR8327_VTU_FUNC1_OP_PURGE, vid, 0));
 }
 
 static int
 ar8327_get_dot1q_vlan(struct arswitch_softc *sc, uint32_t *ports,
     uint32_t *untagged_ports, int vid)
 {
 	int i, r;
 	uint32_t op, reg, val;
 
 	op = AR8327_VTU_FUNC1_OP_GET_ONE;
 
 	/* Filter out the vid flags; only grab the VLAN ID */
 	vid &= 0xfff;
 
 	/* XXX TODO: the VTU here stores egress mode - keep, tag, untagged, none */
 	r = ar8327_vlan_op(sc, op, vid, 0);
 	if (r != 0) {
 		device_printf(sc->sc_dev, "%s: %d: op failed\n", __func__, vid);
 	}
 
 	reg = arswitch_readreg(sc->sc_dev, AR8327_REG_VTU_FUNC0);
 	DPRINTF(sc->sc_dev, "%s: %d: reg=0x%08x\n", __func__, vid, reg);
 
 	/*
 	 * If any of the bits are set, update the port mask.
 	 * Worry about the port config itself when getport() is called.
 	 */
 	*ports = 0;
 	for (i = 0; i < AR8327_NUM_PORTS; i++) {
 		val = reg >> AR8327_VTU_FUNC0_EG_MODE_S(i);
 		val = val & 0x3;
 		/* XXX KEEP (unmodified?) */
 		if (val == AR8327_VTU_FUNC0_EG_MODE_TAG) {
 			*ports |= (1 << i);
 		} else if (val == AR8327_VTU_FUNC0_EG_MODE_UNTAG) {
 			*ports |= (1 << i);
 			*untagged_ports |= (1 << i);
 		}
 	}
 
 	return (0);
 }
 
 static int
 ar8327_set_dot1q_vlan(struct arswitch_softc *sc, uint32_t ports,
     uint32_t untagged_ports, int vid)
 {
 	int i;
 	uint32_t op, val, mode;
 
 	op = AR8327_VTU_FUNC1_OP_LOAD;
 	vid &= 0xfff;
 
 	DPRINTF(sc->sc_dev,
 	    "%s: vid: %d, ports=0x%08x, untagged_ports=0x%08x\n",
 	    __func__,
 	    vid,
 	    ports,
 	    untagged_ports);
 
 	/*
 	 * Mark it as valid; and that it should use per-VLAN MAC table,
 	 * not VID=0 when doing MAC lookups
 	 */
 	val = AR8327_VTU_FUNC0_VALID | AR8327_VTU_FUNC0_IVL;
 
 	for (i = 0; i < AR8327_NUM_PORTS; i++) {
 		if ((ports & BIT(i)) == 0)
 			mode = AR8327_VTU_FUNC0_EG_MODE_NOT;
 		else if (untagged_ports & BIT(i))
 			mode = AR8327_VTU_FUNC0_EG_MODE_UNTAG;
 		else
 			mode = AR8327_VTU_FUNC0_EG_MODE_TAG;
 
 		val |= mode << AR8327_VTU_FUNC0_EG_MODE_S(i);
 	}
 
 	return (ar8327_vlan_op(sc, op, vid, val));
 }
 
 void
 ar8327_attach(struct arswitch_softc *sc)
 {
 
 	sc->hal.arswitch_hw_setup = ar8327_hw_setup;
 	sc->hal.arswitch_hw_global_setup = ar8327_hw_global_setup;
 
 	sc->hal.arswitch_port_init = ar8327_port_init;
 
 	sc->hal.arswitch_vlan_getvgroup = ar8327_vlan_getvgroup;
 	sc->hal.arswitch_vlan_setvgroup = ar8327_vlan_setvgroup;
 	sc->hal.arswitch_port_vlan_setup = ar8327_port_vlan_setup;
 	sc->hal.arswitch_port_vlan_get = ar8327_port_vlan_get;
 	sc->hal.arswitch_flush_dot1q_vlan = ar8327_flush_dot1q_vlan;
 	sc->hal.arswitch_purge_dot1q_vlan = ar8327_purge_dot1q_vlan;
 	sc->hal.arswitch_set_dot1q_vlan = ar8327_set_dot1q_vlan;
 	sc->hal.arswitch_get_dot1q_vlan = ar8327_get_dot1q_vlan;
 
 	sc->hal.arswitch_vlan_init_hw = ar8327_reset_vlans;
 	sc->hal.arswitch_vlan_get_pvid = ar8327_get_pvid;
 	sc->hal.arswitch_vlan_set_pvid = ar8327_set_pvid;
 
 	sc->hal.arswitch_get_port_vlan = ar8327_vlan_get_port;
 	sc->hal.arswitch_set_port_vlan = ar8327_vlan_set_port;
 
 	sc->hal.arswitch_atu_flush = ar8327_atu_flush;
 
 	/*
 	 * Reading the PHY via the MDIO interface currently doesn't
 	 * work correctly.
 	 *
 	 * So for now, just go direct to the PHY registers themselves.
 	 * This has always worked  on external devices, but not internal
 	 * devices (AR934x, AR724x, AR933x.)
 	 */
 	sc->hal.arswitch_phy_read = arswitch_readphy_external;
 	sc->hal.arswitch_phy_write = arswitch_writephy_external;
 
 	/* Set the switch vlan capabilities. */
 	sc->info.es_vlan_caps = ETHERSWITCH_VLAN_DOT1Q |
 	    ETHERSWITCH_VLAN_PORT | ETHERSWITCH_VLAN_DOUBLE_TAG;
 	sc->info.es_nvlangroups = AR8X16_MAX_VLANS;
 }
Index: user/alc/PQ_LAUNDRY/sys/dev/etherswitch/arswitch/arswitch_8327.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/dev/etherswitch/arswitch/arswitch_8327.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/dev/etherswitch/arswitch/arswitch_8327.h	(revision 303775)
@@ -1,91 +1,96 @@
 /*-
  * Copyright (c) 2014 Adrian Chadd <adrian@FreeBSD.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 #ifndef	__ARSWITCH_8327_H__
 #define	__ARSWITCH_8327_H__
 
 enum ar8327_pad_mode {
 	AR8327_PAD_NC = 0,
 	AR8327_PAD_MAC2MAC_MII,
 	AR8327_PAD_MAC2MAC_GMII,
 	AR8327_PAD_MAC_SGMII,
 	AR8327_PAD_MAC2PHY_MII,
 	AR8327_PAD_MAC2PHY_GMII,
 	AR8327_PAD_MAC_RGMII,
 	AR8327_PAD_PHY_GMII,
 	AR8327_PAD_PHY_RGMII,
 	AR8327_PAD_PHY_MII,
 };
 
 enum ar8327_clk_delay_sel {
 	AR8327_CLK_DELAY_SEL0 = 0,
 	AR8327_CLK_DELAY_SEL1,
 	AR8327_CLK_DELAY_SEL2,
 	AR8327_CLK_DELAY_SEL3,
 };
 
 /* XXX update the field types */
 struct ar8327_pad_cfg {
 	uint32_t mode;
 	uint32_t rxclk_sel;
 	uint32_t txclk_sel;
 	uint32_t txclk_delay_sel;
 	uint32_t rxclk_delay_sel;
 	uint32_t txclk_delay_en;
 	uint32_t rxclk_delay_en;
 	uint32_t sgmii_delay_en;
 	uint32_t pipe_rxclk_sel;
 };
 
 struct ar8327_sgmii_cfg {
 	uint32_t sgmii_ctrl;
 	uint32_t serdes_aen;
 };
 
 struct ar8327_led_cfg {
 	uint32_t led_ctrl0;
 	uint32_t led_ctrl1;
 	uint32_t led_ctrl2;
 	uint32_t led_ctrl3;
 	uint32_t open_drain;
 };
 
 struct ar8327_port_cfg {
 #define	AR8327_PORT_SPEED_10		1
 #define	AR8327_PORT_SPEED_100		2
 #define	AR8327_PORT_SPEED_1000		3
 	uint32_t speed;
 	uint32_t force_link;
 	uint32_t duplex;
 	uint32_t txpause;
 	uint32_t rxpause;
 };
 
+extern struct ar8327_led_mapping {
+	int reg;
+	int shift;
+} ar8327_led_mapping[AR8327_NUM_PHYS][ETHERSWITCH_PORT_MAX_LEDS];
+
 extern	void ar8327_attach(struct arswitch_softc *sc);
 
 #endif	/* __ARSWITCH_8327_H__ */
 
Index: user/alc/PQ_LAUNDRY/sys/dev/etherswitch/arswitch/arswitchvar.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/dev/etherswitch/arswitch/arswitchvar.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/dev/etherswitch/arswitch/arswitchvar.h	(revision 303775)
@@ -1,150 +1,160 @@
 /*-
  * Copyright (c) 2011-2012 Stefan Bethke.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 #ifndef	__ARSWITCHVAR_H__
 #define	__ARSWITCHVAR_H__
 
 typedef enum {
 	AR8X16_SWITCH_NONE,
 	AR8X16_SWITCH_AR7240,
 	AR8X16_SWITCH_AR8216,
 	AR8X16_SWITCH_AR8226,
 	AR8X16_SWITCH_AR8316,
 	AR8X16_SWITCH_AR9340,
 	AR8X16_SWITCH_AR8327,
 	AR8X16_SWITCH_AR8337,
 } ar8x16_switch_type;
 
 /*
  * XXX TODO: start using this where required
  */
 #define	AR8X16_IS_SWITCH(_sc, _type) \
 	    (!!((_sc)->sc_switchtype == AR8X16_SWITCH_ ## _type))
 
 #define ARSWITCH_NUM_PORTS	MAX(AR8327_NUM_PORTS, AR8X16_NUM_PORTS)
 #define ARSWITCH_NUM_PHYS	MAX(AR8327_NUM_PHYS, AR8X16_NUM_PHYS)
 
+#define ARSWITCH_NUM_LEDS	3
+
+struct arswitch_dev_led {
+	struct arswitch_softc	*sc;
+	struct cdev	*led;
+	int		phy;
+	int		lednum;
+};
+
 struct arswitch_softc {
 	struct mtx	sc_mtx;		/* serialize access to softc */
 	device_t	sc_dev;
 	int		phy4cpu;	/* PHY4 is connected to the CPU */
 	int		numphys;	/* PHYs we manage */
 	int		is_rgmii;	/* PHY mode is RGMII (XXX which PHY?) */
 	int		is_gmii;	/* PHY mode is GMII (XXX which PHY?) */
 	int		is_mii;		/* PHY mode is MII (XXX which PHY?) */
 	int		page;
 	int		is_internal_switch;
 	int		chip_ver;
 	int		chip_rev;
 	int		mii_lo_first;		/* Send low data DWORD before high */
 	ar8x16_switch_type	sc_switchtype;
 	/* should be the max of both pre-AR8327 and AR8327 ports */
 	char		*ifname[ARSWITCH_NUM_PHYS];
 	device_t	miibus[ARSWITCH_NUM_PHYS];
 	struct ifnet	*ifp[ARSWITCH_NUM_PHYS];
+	struct arswitch_dev_led	dev_led[ARSWITCH_NUM_PHYS][ARSWITCH_NUM_LEDS];
 	struct callout	callout_tick;
 	etherswitch_info_t info;
 
 	/* VLANs support */
 	int		vid[AR8X16_MAX_VLANS];
 	uint32_t	vlan_mode;
 
 	struct {
 		/* Global setup */
 		int (* arswitch_hw_setup) (struct arswitch_softc *);
 		int (* arswitch_hw_global_setup) (struct arswitch_softc *);
 
 		/* Port functions */
 		void (* arswitch_port_init) (struct arswitch_softc *, int);
 
 		/* ATU functions */
 		int (* arswitch_atu_flush) (struct arswitch_softc *);
 
 		/* VLAN functions */
 		int (* arswitch_port_vlan_setup) (struct arswitch_softc *,
 		    etherswitch_port_t *);
 		int (* arswitch_port_vlan_get) (struct arswitch_softc *,
 		    etherswitch_port_t *);
 		void (* arswitch_vlan_init_hw) (struct arswitch_softc *);
 		int (* arswitch_vlan_getvgroup) (struct arswitch_softc *,
 		    etherswitch_vlangroup_t *);
 		int (* arswitch_vlan_setvgroup) (struct arswitch_softc *,
 		    etherswitch_vlangroup_t *);
 		int (* arswitch_vlan_get_pvid) (struct arswitch_softc *, int,
 		    int *);
 		int (* arswitch_vlan_set_pvid) (struct arswitch_softc *, int,
 		    int);
 
 		int (* arswitch_flush_dot1q_vlan) (struct arswitch_softc *sc);
 		int (* arswitch_purge_dot1q_vlan) (struct arswitch_softc *sc,
 		    int vid);
 		int (* arswitch_get_dot1q_vlan) (struct arswitch_softc *,
 		    uint32_t *ports, uint32_t *untagged_ports, int vid);
 		int (* arswitch_set_dot1q_vlan) (struct arswitch_softc *sc,
 		    uint32_t ports, uint32_t untagged_ports, int vid);
 		int (* arswitch_get_port_vlan) (struct arswitch_softc *sc,
 		    uint32_t *ports, int vid);
 		int (* arswitch_set_port_vlan) (struct arswitch_softc *sc,
 		    uint32_t ports, int vid);
 
 		/* PHY functions */
 		int (* arswitch_phy_read) (device_t, int, int);
 		int (* arswitch_phy_write) (device_t, int, int, int);
 	} hal;
 
 	struct {
 		uint32_t port0_status;
 		uint32_t port5_status;
 		uint32_t port6_status;
 	} ar8327;
 };
 
 #define	ARSWITCH_LOCK(_sc)			\
 	    mtx_lock(&(_sc)->sc_mtx)
 #define	ARSWITCH_UNLOCK(_sc)			\
 	    mtx_unlock(&(_sc)->sc_mtx)
 #define	ARSWITCH_LOCK_ASSERT(_sc, _what)	\
 	    mtx_assert(&(_sc)->sc_mtx, (_what))
 #define	ARSWITCH_TRYLOCK(_sc)			\
 	    mtx_trylock(&(_sc)->sc_mtx)
 
 #if defined(DEBUG)
 #define DPRINTF(dev, args...) device_printf(dev, args)
 #define DEVERR(dev, err, fmt, args...) do { \
 		if (err != 0) device_printf(dev, fmt, err, args); \
 	} while (0)
 #define DEBUG_INCRVAR(var)	do { \
 		var++; \
 	} while (0)
 #else
 #define DPRINTF(dev, args...)
 #define DEVERR(dev, err, fmt, args...)
 #define DEBUG_INCRVAR(var)
 #endif
 
 #endif	/* __ARSWITCHVAR_H__ */
 
Index: user/alc/PQ_LAUNDRY/sys/dev/etherswitch/etherswitch.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/dev/etherswitch/etherswitch.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/dev/etherswitch/etherswitch.h	(revision 303775)
@@ -1,103 +1,116 @@
 /*
  * $FreeBSD$
  */
 
 #ifndef __SYS_DEV_ETHERSWITCH_ETHERSWITCH_H
 #define __SYS_DEV_ETHERSWITCH_ETHERSWITCH_H
 
 #include <sys/ioccom.h>
 
 #ifdef _KERNEL
 extern devclass_t       etherswitch_devclass;
 extern driver_t         etherswitch_driver;
 #endif /* _KERNEL */
 
 struct etherswitch_reg {
 	uint16_t	reg;
-	uint16_t	val;
+	uint32_t	val;
 };
 typedef struct etherswitch_reg etherswitch_reg_t;
 
 struct etherswitch_phyreg {
 	uint16_t	phy;
 	uint16_t	reg;
 	uint16_t	val;
 };
 typedef struct etherswitch_phyreg etherswitch_phyreg_t;
 
 #define	ETHERSWITCH_NAMEMAX		64
 #define	ETHERSWITCH_VID_MASK		0xfff
 #define	ETHERSWITCH_VID_VALID		(1 << 12)
 #define	ETHERSWITCH_VLAN_ISL		(1 << 0)	/* ISL */
 #define	ETHERSWITCH_VLAN_PORT		(1 << 1)	/* Port based vlan */
 #define	ETHERSWITCH_VLAN_DOT1Q		(1 << 2)	/* 802.1q */
 #define	ETHERSWITCH_VLAN_DOT1Q_4K	(1 << 3)	/* 4k support on 802.1q */
 #define	ETHERSWITCH_VLAN_DOUBLE_TAG	(1 << 4)	/* Q-in-Q */
 #define	ETHERSWITCH_VLAN_CAPS_BITS	\
 "\020\1ISL\2PORT\3DOT1Q\4DOT1Q4K\5QinQ"
 
 struct etherswitch_info {
 	int		es_nports;
 	int		es_nvlangroups;
 	char		es_name[ETHERSWITCH_NAMEMAX];
 	uint32_t	es_vlan_caps;
 };
 typedef struct etherswitch_info etherswitch_info_t;
 
 #define	ETHERSWITCH_CONF_FLAGS		(1 << 0)
 #define	ETHERSWITCH_CONF_MIRROR		(1 << 1)
 #define	ETHERSWITCH_CONF_VLAN_MODE	(1 << 2)
 
 struct etherswitch_conf {
 	uint32_t	cmd;		/* What to configure */
 	uint32_t	vlan_mode;	/* Switch VLAN mode */
 };
 typedef struct etherswitch_conf etherswitch_conf_t;
 
 #define	ETHERSWITCH_PORT_CPU		(1 << 0)
 #define	ETHERSWITCH_PORT_STRIPTAG	(1 << 1)
 #define	ETHERSWITCH_PORT_ADDTAG		(1 << 2)
 #define	ETHERSWITCH_PORT_FIRSTLOCK	(1 << 3)
 #define	ETHERSWITCH_PORT_DROPUNTAGGED	(1 << 4)
 #define	ETHERSWITCH_PORT_DOUBLE_TAG	(1 << 5)
 #define	ETHERSWITCH_PORT_INGRESS	(1 << 6)
 #define	ETHERSWITCH_PORT_FLAGS_BITS	\
 "\020\1CPUPORT\2STRIPTAG\3ADDTAG\4FIRSTLOCK\5DROPUNTAGGED\6QinQ\7INGRESS"
 
+#define ETHERSWITCH_PORT_MAX_LEDS 3
+
+enum etherswitch_port_led {
+	ETHERSWITCH_PORT_LED_DEFAULT,
+	ETHERSWITCH_PORT_LED_ON,
+	ETHERSWITCH_PORT_LED_OFF,
+	ETHERSWITCH_PORT_LED_BLINK,
+	ETHERSWITCH_PORT_LED_MAX
+};
+typedef enum etherswitch_port_led etherswitch_port_led_t;
+
 struct etherswitch_port {
 	int		es_port;
 	int		es_pvid;
+	int		es_nleds;
 	uint32_t	es_flags;
+	etherswitch_port_led_t es_led[ETHERSWITCH_PORT_MAX_LEDS];
 	union {
 		struct ifreq		es_uifr;
 		struct ifmediareq	es_uifmr;
 	} es_ifu;
 #define es_ifr		es_ifu.es_uifr
 #define es_ifmr		es_ifu.es_uifmr
 };
 typedef struct etherswitch_port etherswitch_port_t;
 
 struct etherswitch_vlangroup {
 	int		es_vlangroup;
 	int		es_vid;
 	int		es_member_ports;
 	int		es_untagged_ports;
 	int		es_fid;
 };
 typedef struct etherswitch_vlangroup etherswitch_vlangroup_t;
 
 #define ETHERSWITCH_PORTMASK(_port)	(1 << (_port))
 
 #define IOETHERSWITCHGETINFO		_IOR('i', 1, etherswitch_info_t)
 #define IOETHERSWITCHGETREG		_IOWR('i', 2, etherswitch_reg_t)
 #define IOETHERSWITCHSETREG		_IOW('i', 3, etherswitch_reg_t)
 #define IOETHERSWITCHGETPORT		_IOWR('i', 4, etherswitch_port_t)
 #define IOETHERSWITCHSETPORT		_IOW('i', 5, etherswitch_port_t)
 #define IOETHERSWITCHGETVLANGROUP	_IOWR('i', 6, etherswitch_vlangroup_t)
 #define IOETHERSWITCHSETVLANGROUP	_IOW('i', 7, etherswitch_vlangroup_t)
 #define IOETHERSWITCHGETPHYREG		_IOWR('i', 8, etherswitch_phyreg_t)
 #define IOETHERSWITCHSETPHYREG		_IOW('i', 9, etherswitch_phyreg_t)
 #define IOETHERSWITCHGETCONF		_IOR('i', 10, etherswitch_conf_t)
 #define IOETHERSWITCHSETCONF		_IOW('i', 11, etherswitch_conf_t)
 
 #endif
Index: user/alc/PQ_LAUNDRY/sys/dev/hyperv/vmbus/vmbus_brvar.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/dev/hyperv/vmbus/vmbus_brvar.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/dev/hyperv/vmbus/vmbus_brvar.h	(revision 303775)
@@ -1,100 +1,104 @@
 /*-
  * Copyright (c) 2009-2012,2016 Microsoft Corp.
  * Copyright (c) 2012 NetApp Inc.
  * Copyright (c) 2012 Citrix Inc.
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice unmodified, this list of conditions, and the following
  *    disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef _VMBUS_BRVAR_H_
 #define _VMBUS_BRVAR_H_
 
 #include <sys/param.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/_iovec.h>
 
 struct vmbus_br {
 	struct vmbus_bufring	*vbr;
 	uint32_t		vbr_dsize;	/* total data size */
 };
 
 #define vbr_windex		vbr->br_windex
 #define vbr_rindex		vbr->br_rindex
 #define vbr_imask		vbr->br_imask
 #define vbr_data		vbr->br_data
 
 struct vmbus_rxbr {
 	struct mtx		rxbr_lock;
 	struct vmbus_br		rxbr;
 };
 
 #define rxbr_windex		rxbr.vbr_windex
 #define rxbr_rindex		rxbr.vbr_rindex
 #define rxbr_imask		rxbr.vbr_imask
 #define rxbr_data		rxbr.vbr_data
 #define rxbr_dsize		rxbr.vbr_dsize
 
 struct vmbus_txbr {
 	struct mtx		txbr_lock;
 	struct vmbus_br		txbr;
 };
 
 #define txbr_windex		txbr.vbr_windex
 #define txbr_rindex		txbr.vbr_rindex
 #define txbr_imask		txbr.vbr_imask
 #define txbr_data		txbr.vbr_data
 #define txbr_dsize		txbr.vbr_dsize
 
 struct sysctl_ctx_list;
 struct sysctl_oid;
 
 static __inline int
 vmbus_txbr_maxpktsz(const struct vmbus_txbr *tbr)
 {
-	/* 1/2 data size */
-	return (tbr->txbr_dsize / 2);
+	/*
+	 * - 64 bits for the trailing start index (- sizeof(uint64_t)).
+	 * - The rindex and windex can't be same (- 1).  See
+	 *   the comment near vmbus_bufring.br_{r,w}index.
+	 */
+	return (tbr->txbr_dsize - sizeof(uint64_t) - 1);
 }
 
 void		vmbus_br_sysctl_create(struct sysctl_ctx_list *ctx,
 		    struct sysctl_oid *br_tree, struct vmbus_br *br,
 		    const char *name);
 
 void		vmbus_rxbr_init(struct vmbus_rxbr *rbr);
 void		vmbus_rxbr_deinit(struct vmbus_rxbr *rbr);
 void		vmbus_rxbr_setup(struct vmbus_rxbr *rbr, void *buf, int blen);
 int		vmbus_rxbr_peek(struct vmbus_rxbr *rbr, void *data, int dlen);
 int		vmbus_rxbr_read(struct vmbus_rxbr *rbr, void *data, int dlen,
 		    uint32_t skip);
 void		vmbus_rxbr_intr_mask(struct vmbus_rxbr *rbr);
 uint32_t	vmbus_rxbr_intr_unmask(struct vmbus_rxbr *rbr);
 
 void		vmbus_txbr_init(struct vmbus_txbr *tbr);
 void		vmbus_txbr_deinit(struct vmbus_txbr *tbr);
 void		vmbus_txbr_setup(struct vmbus_txbr *tbr, void *buf, int blen);
 int		vmbus_txbr_write(struct vmbus_txbr *tbr,
 		    const struct iovec iov[], int iovlen, boolean_t *need_sig);
 
 #endif  /* _VMBUS_BRVAR_H_ */
Index: user/alc/PQ_LAUNDRY/sys/dev/ioat/ioat.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/dev/ioat/ioat.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/dev/ioat/ioat.c	(revision 303775)
@@ -1,2360 +1,2362 @@
 /*-
  * Copyright (C) 2012 Intel Corporation
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_ddb.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/bus.h>
 #include <sys/conf.h>
 #include <sys/ioccom.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/module.h>
 #include <sys/mutex.h>
 #include <sys/rman.h>
 #include <sys/sbuf.h>
 #include <sys/sysctl.h>
 #include <sys/taskqueue.h>
 #include <sys/time.h>
 #include <dev/pci/pcireg.h>
 #include <dev/pci/pcivar.h>
 #include <machine/bus.h>
 #include <machine/resource.h>
 #include <machine/stdarg.h>
 
 #ifdef DDB
 #include <ddb/ddb.h>
 #endif
 
 #include "ioat.h"
 #include "ioat_hw.h"
 #include "ioat_internal.h"
 
 #ifndef	BUS_SPACE_MAXADDR_40BIT
 #define	BUS_SPACE_MAXADDR_40BIT	0xFFFFFFFFFFULL
 #endif
 #define	IOAT_REFLK	(&ioat->submit_lock)
 #define	IOAT_SHRINK_PERIOD	(10 * hz)
 
 static int ioat_probe(device_t device);
 static int ioat_attach(device_t device);
 static int ioat_detach(device_t device);
 static int ioat_setup_intr(struct ioat_softc *ioat);
 static int ioat_teardown_intr(struct ioat_softc *ioat);
 static int ioat3_attach(device_t device);
 static int ioat_start_channel(struct ioat_softc *ioat);
 static int ioat_map_pci_bar(struct ioat_softc *ioat);
 static void ioat_dmamap_cb(void *arg, bus_dma_segment_t *segs, int nseg,
     int error);
 static void ioat_interrupt_handler(void *arg);
 static boolean_t ioat_model_resets_msix(struct ioat_softc *ioat);
 static int chanerr_to_errno(uint32_t);
 static void ioat_process_events(struct ioat_softc *ioat);
 static inline uint32_t ioat_get_active(struct ioat_softc *ioat);
 static inline uint32_t ioat_get_ring_space(struct ioat_softc *ioat);
 static void ioat_free_ring(struct ioat_softc *, uint32_t size,
     struct ioat_descriptor **);
 static void ioat_free_ring_entry(struct ioat_softc *ioat,
     struct ioat_descriptor *desc);
 static struct ioat_descriptor *ioat_alloc_ring_entry(struct ioat_softc *,
     int mflags);
 static int ioat_reserve_space(struct ioat_softc *, uint32_t, int mflags);
 static struct ioat_descriptor *ioat_get_ring_entry(struct ioat_softc *ioat,
     uint32_t index);
 static struct ioat_descriptor **ioat_prealloc_ring(struct ioat_softc *,
     uint32_t size, boolean_t need_dscr, int mflags);
 static int ring_grow(struct ioat_softc *, uint32_t oldorder,
     struct ioat_descriptor **);
 static int ring_shrink(struct ioat_softc *, uint32_t oldorder,
     struct ioat_descriptor **);
 static void ioat_halted_debug(struct ioat_softc *, uint32_t);
 static void ioat_poll_timer_callback(void *arg);
 static void ioat_shrink_timer_callback(void *arg);
 static void dump_descriptor(void *hw_desc);
 static void ioat_submit_single(struct ioat_softc *ioat);
 static void ioat_comp_update_map(void *arg, bus_dma_segment_t *seg, int nseg,
     int error);
 static int ioat_reset_hw(struct ioat_softc *ioat);
 static void ioat_reset_hw_task(void *, int);
 static void ioat_setup_sysctl(device_t device);
 static int sysctl_handle_reset(SYSCTL_HANDLER_ARGS);
 static inline struct ioat_softc *ioat_get(struct ioat_softc *,
     enum ioat_ref_kind);
 static inline void ioat_put(struct ioat_softc *, enum ioat_ref_kind);
 static inline void _ioat_putn(struct ioat_softc *, uint32_t,
     enum ioat_ref_kind, boolean_t);
 static inline void ioat_putn(struct ioat_softc *, uint32_t,
     enum ioat_ref_kind);
 static inline void ioat_putn_locked(struct ioat_softc *, uint32_t,
     enum ioat_ref_kind);
 static void ioat_drain_locked(struct ioat_softc *);
 
 #define	ioat_log_message(v, ...) do {					\
 	if ((v) <= g_ioat_debug_level) {				\
 		device_printf(ioat->device, __VA_ARGS__);		\
 	}								\
 } while (0)
 
 MALLOC_DEFINE(M_IOAT, "ioat", "ioat driver memory allocations");
 SYSCTL_NODE(_hw, OID_AUTO, ioat, CTLFLAG_RD, 0, "ioat node");
 
 static int g_force_legacy_interrupts;
 SYSCTL_INT(_hw_ioat, OID_AUTO, force_legacy_interrupts, CTLFLAG_RDTUN,
     &g_force_legacy_interrupts, 0, "Set to non-zero to force MSI-X disabled");
 
 int g_ioat_debug_level = 0;
 SYSCTL_INT(_hw_ioat, OID_AUTO, debug_level, CTLFLAG_RWTUN, &g_ioat_debug_level,
     0, "Set log level (0-3) for ioat(4). Higher is more verbose.");
 
 /*
  * OS <-> Driver interface structures
  */
 static device_method_t ioat_pci_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,     ioat_probe),
 	DEVMETHOD(device_attach,    ioat_attach),
 	DEVMETHOD(device_detach,    ioat_detach),
 	DEVMETHOD_END
 };
 
 static driver_t ioat_pci_driver = {
 	"ioat",
 	ioat_pci_methods,
 	sizeof(struct ioat_softc),
 };
 
 static devclass_t ioat_devclass;
 DRIVER_MODULE(ioat, pci, ioat_pci_driver, ioat_devclass, 0, 0);
 MODULE_VERSION(ioat, 1);
 
 /*
  * Private data structures
  */
 static struct ioat_softc *ioat_channel[IOAT_MAX_CHANNELS];
 static unsigned ioat_channel_index = 0;
 SYSCTL_UINT(_hw_ioat, OID_AUTO, channels, CTLFLAG_RD, &ioat_channel_index, 0,
     "Number of IOAT channels attached");
 
 static struct _pcsid
 {
 	u_int32_t   type;
 	const char  *desc;
 } pci_ids[] = {
 	{ 0x34308086, "TBG IOAT Ch0" },
 	{ 0x34318086, "TBG IOAT Ch1" },
 	{ 0x34328086, "TBG IOAT Ch2" },
 	{ 0x34338086, "TBG IOAT Ch3" },
 	{ 0x34298086, "TBG IOAT Ch4" },
 	{ 0x342a8086, "TBG IOAT Ch5" },
 	{ 0x342b8086, "TBG IOAT Ch6" },
 	{ 0x342c8086, "TBG IOAT Ch7" },
 
 	{ 0x37108086, "JSF IOAT Ch0" },
 	{ 0x37118086, "JSF IOAT Ch1" },
 	{ 0x37128086, "JSF IOAT Ch2" },
 	{ 0x37138086, "JSF IOAT Ch3" },
 	{ 0x37148086, "JSF IOAT Ch4" },
 	{ 0x37158086, "JSF IOAT Ch5" },
 	{ 0x37168086, "JSF IOAT Ch6" },
 	{ 0x37178086, "JSF IOAT Ch7" },
 	{ 0x37188086, "JSF IOAT Ch0 (RAID)" },
 	{ 0x37198086, "JSF IOAT Ch1 (RAID)" },
 
 	{ 0x3c208086, "SNB IOAT Ch0" },
 	{ 0x3c218086, "SNB IOAT Ch1" },
 	{ 0x3c228086, "SNB IOAT Ch2" },
 	{ 0x3c238086, "SNB IOAT Ch3" },
 	{ 0x3c248086, "SNB IOAT Ch4" },
 	{ 0x3c258086, "SNB IOAT Ch5" },
 	{ 0x3c268086, "SNB IOAT Ch6" },
 	{ 0x3c278086, "SNB IOAT Ch7" },
 	{ 0x3c2e8086, "SNB IOAT Ch0 (RAID)" },
 	{ 0x3c2f8086, "SNB IOAT Ch1 (RAID)" },
 
 	{ 0x0e208086, "IVB IOAT Ch0" },
 	{ 0x0e218086, "IVB IOAT Ch1" },
 	{ 0x0e228086, "IVB IOAT Ch2" },
 	{ 0x0e238086, "IVB IOAT Ch3" },
 	{ 0x0e248086, "IVB IOAT Ch4" },
 	{ 0x0e258086, "IVB IOAT Ch5" },
 	{ 0x0e268086, "IVB IOAT Ch6" },
 	{ 0x0e278086, "IVB IOAT Ch7" },
 	{ 0x0e2e8086, "IVB IOAT Ch0 (RAID)" },
 	{ 0x0e2f8086, "IVB IOAT Ch1 (RAID)" },
 
 	{ 0x2f208086, "HSW IOAT Ch0" },
 	{ 0x2f218086, "HSW IOAT Ch1" },
 	{ 0x2f228086, "HSW IOAT Ch2" },
 	{ 0x2f238086, "HSW IOAT Ch3" },
 	{ 0x2f248086, "HSW IOAT Ch4" },
 	{ 0x2f258086, "HSW IOAT Ch5" },
 	{ 0x2f268086, "HSW IOAT Ch6" },
 	{ 0x2f278086, "HSW IOAT Ch7" },
 	{ 0x2f2e8086, "HSW IOAT Ch0 (RAID)" },
 	{ 0x2f2f8086, "HSW IOAT Ch1 (RAID)" },
 
 	{ 0x0c508086, "BWD IOAT Ch0" },
 	{ 0x0c518086, "BWD IOAT Ch1" },
 	{ 0x0c528086, "BWD IOAT Ch2" },
 	{ 0x0c538086, "BWD IOAT Ch3" },
 
 	{ 0x6f508086, "BDXDE IOAT Ch0" },
 	{ 0x6f518086, "BDXDE IOAT Ch1" },
 	{ 0x6f528086, "BDXDE IOAT Ch2" },
 	{ 0x6f538086, "BDXDE IOAT Ch3" },
 
 	{ 0x6f208086, "BDX IOAT Ch0" },
 	{ 0x6f218086, "BDX IOAT Ch1" },
 	{ 0x6f228086, "BDX IOAT Ch2" },
 	{ 0x6f238086, "BDX IOAT Ch3" },
 	{ 0x6f248086, "BDX IOAT Ch4" },
 	{ 0x6f258086, "BDX IOAT Ch5" },
 	{ 0x6f268086, "BDX IOAT Ch6" },
 	{ 0x6f278086, "BDX IOAT Ch7" },
 	{ 0x6f2e8086, "BDX IOAT Ch0 (RAID)" },
 	{ 0x6f2f8086, "BDX IOAT Ch1 (RAID)" },
 
 	{ 0x00000000, NULL           }
 };
 
 /*
  * OS <-> Driver linkage functions
  */
 static int
 ioat_probe(device_t device)
 {
 	struct _pcsid *ep;
 	u_int32_t type;
 
 	type = pci_get_devid(device);
 	for (ep = pci_ids; ep->type; ep++) {
 		if (ep->type == type) {
 			device_set_desc(device, ep->desc);
 			return (0);
 		}
 	}
 	return (ENXIO);
 }
 
 static int
 ioat_attach(device_t device)
 {
 	struct ioat_softc *ioat;
 	int error;
 
 	ioat = DEVICE2SOFTC(device);
 	ioat->device = device;
 
 	error = ioat_map_pci_bar(ioat);
 	if (error != 0)
 		goto err;
 
 	ioat->version = ioat_read_cbver(ioat);
 	if (ioat->version < IOAT_VER_3_0) {
 		error = ENODEV;
 		goto err;
 	}
 
 	error = ioat3_attach(device);
 	if (error != 0)
 		goto err;
 
 	error = pci_enable_busmaster(device);
 	if (error != 0)
 		goto err;
 
 	error = ioat_setup_intr(ioat);
 	if (error != 0)
 		goto err;
 
 	error = ioat_reset_hw(ioat);
 	if (error != 0)
 		goto err;
 
 	ioat_process_events(ioat);
 	ioat_setup_sysctl(device);
 
 	ioat->chan_idx = ioat_channel_index;
 	ioat_channel[ioat_channel_index++] = ioat;
 	ioat_test_attach();
 
 err:
 	if (error != 0)
 		ioat_detach(device);
 	return (error);
 }
 
 static int
 ioat_detach(device_t device)
 {
 	struct ioat_softc *ioat;
 
 	ioat = DEVICE2SOFTC(device);
 
 	ioat_test_detach();
 	taskqueue_drain(taskqueue_thread, &ioat->reset_task);
 
 	mtx_lock(IOAT_REFLK);
 	ioat->quiescing = TRUE;
 	ioat->destroying = TRUE;
 	wakeup(&ioat->quiescing);
 	wakeup(&ioat->resetting);
 
 	ioat_channel[ioat->chan_idx] = NULL;
 
 	ioat_drain_locked(ioat);
 	mtx_unlock(IOAT_REFLK);
 
 	ioat_teardown_intr(ioat);
 	callout_drain(&ioat->poll_timer);
 	callout_drain(&ioat->shrink_timer);
 
 	pci_disable_busmaster(device);
 
 	if (ioat->pci_resource != NULL)
 		bus_release_resource(device, SYS_RES_MEMORY,
 		    ioat->pci_resource_id, ioat->pci_resource);
 
 	if (ioat->ring != NULL)
 		ioat_free_ring(ioat, 1 << ioat->ring_size_order, ioat->ring);
 
 	if (ioat->comp_update != NULL) {
 		bus_dmamap_unload(ioat->comp_update_tag, ioat->comp_update_map);
 		bus_dmamem_free(ioat->comp_update_tag, ioat->comp_update,
 		    ioat->comp_update_map);
 		bus_dma_tag_destroy(ioat->comp_update_tag);
 	}
 
 	bus_dma_tag_destroy(ioat->hw_desc_tag);
 
 	return (0);
 }
 
 static int
 ioat_teardown_intr(struct ioat_softc *ioat)
 {
 
 	if (ioat->tag != NULL)
 		bus_teardown_intr(ioat->device, ioat->res, ioat->tag);
 
 	if (ioat->res != NULL)
 		bus_release_resource(ioat->device, SYS_RES_IRQ,
 		    rman_get_rid(ioat->res), ioat->res);
 
 	pci_release_msi(ioat->device);
 	return (0);
 }
 
 static int
 ioat_start_channel(struct ioat_softc *ioat)
 {
 	struct ioat_dma_hw_descriptor *hw_desc;
 	struct ioat_descriptor *desc;
 	struct bus_dmadesc *dmadesc;
 	uint64_t status;
 	uint32_t chanerr;
 	int i;
 
 	ioat_acquire(&ioat->dmaengine);
 
 	/* Submit 'NULL' operation manually to avoid quiescing flag */
 	desc = ioat_get_ring_entry(ioat, ioat->head);
 	dmadesc = &desc->bus_dmadesc;
 	hw_desc = desc->u.dma;
 
 	dmadesc->callback_fn = NULL;
 	dmadesc->callback_arg = NULL;
 
 	hw_desc->u.control_raw = 0;
 	hw_desc->u.control_generic.op = IOAT_OP_COPY;
 	hw_desc->u.control_generic.completion_update = 1;
 	hw_desc->size = 8;
 	hw_desc->src_addr = 0;
 	hw_desc->dest_addr = 0;
 	hw_desc->u.control.null = 1;
 
 	ioat_submit_single(ioat);
 	ioat_release(&ioat->dmaengine);
 
 	for (i = 0; i < 100; i++) {
 		DELAY(1);
 		status = ioat_get_chansts(ioat);
 		if (is_ioat_idle(status))
 			return (0);
 	}
 
 	chanerr = ioat_read_4(ioat, IOAT_CHANERR_OFFSET);
 	ioat_log_message(0, "could not start channel: "
 	    "status = %#jx error = %b\n", (uintmax_t)status, (int)chanerr,
 	    IOAT_CHANERR_STR);
 	return (ENXIO);
 }
 
 /*
  * Initialize Hardware
  */
 static int
 ioat3_attach(device_t device)
 {
 	struct ioat_softc *ioat;
 	struct ioat_descriptor **ring;
 	struct ioat_descriptor *next;
 	struct ioat_dma_hw_descriptor *dma_hw_desc;
 	int i, num_descriptors;
 	int error;
 	uint8_t xfercap;
 
 	error = 0;
 	ioat = DEVICE2SOFTC(device);
 	ioat->capabilities = ioat_read_dmacapability(ioat);
 
 	ioat_log_message(0, "Capabilities: %b\n", (int)ioat->capabilities,
 	    IOAT_DMACAP_STR);
 
 	xfercap = ioat_read_xfercap(ioat);
 	ioat->max_xfer_size = 1 << xfercap;
 
 	ioat->intrdelay_supported = (ioat_read_2(ioat, IOAT_INTRDELAY_OFFSET) &
 	    IOAT_INTRDELAY_SUPPORTED) != 0;
 	if (ioat->intrdelay_supported)
 		ioat->intrdelay_max = IOAT_INTRDELAY_US_MASK;
 
 	/* TODO: need to check DCA here if we ever do XOR/PQ */
 
 	mtx_init(&ioat->submit_lock, "ioat_submit", NULL, MTX_DEF);
 	mtx_init(&ioat->cleanup_lock, "ioat_cleanup", NULL, MTX_DEF);
 	callout_init(&ioat->poll_timer, 1);
 	callout_init(&ioat->shrink_timer, 1);
 	TASK_INIT(&ioat->reset_task, 0, ioat_reset_hw_task, ioat);
 
 	/* Establish lock order for Witness */
 	mtx_lock(&ioat->submit_lock);
 	mtx_lock(&ioat->cleanup_lock);
 	mtx_unlock(&ioat->cleanup_lock);
 	mtx_unlock(&ioat->submit_lock);
 
 	ioat->is_resize_pending = FALSE;
 	ioat->is_submitter_processing = FALSE;
 	ioat->is_completion_pending = FALSE;
 	ioat->is_reset_pending = FALSE;
 	ioat->is_channel_running = FALSE;
 
 	bus_dma_tag_create(bus_get_dma_tag(ioat->device), sizeof(uint64_t), 0x0,
 	    BUS_SPACE_MAXADDR, BUS_SPACE_MAXADDR, NULL, NULL,
 	    sizeof(uint64_t), 1, sizeof(uint64_t), 0, NULL, NULL,
 	    &ioat->comp_update_tag);
 
 	error = bus_dmamem_alloc(ioat->comp_update_tag,
 	    (void **)&ioat->comp_update, BUS_DMA_ZERO, &ioat->comp_update_map);
 	if (ioat->comp_update == NULL)
 		return (ENOMEM);
 
 	error = bus_dmamap_load(ioat->comp_update_tag, ioat->comp_update_map,
 	    ioat->comp_update, sizeof(uint64_t), ioat_comp_update_map, ioat,
 	    0);
 	if (error != 0)
 		return (error);
 
 	ioat->ring_size_order = IOAT_MIN_ORDER;
 
 	num_descriptors = 1 << ioat->ring_size_order;
 
 	bus_dma_tag_create(bus_get_dma_tag(ioat->device), 0x40, 0x0,
 	    BUS_SPACE_MAXADDR_40BIT, BUS_SPACE_MAXADDR, NULL, NULL,
 	    sizeof(struct ioat_dma_hw_descriptor), 1,
 	    sizeof(struct ioat_dma_hw_descriptor), 0, NULL, NULL,
 	    &ioat->hw_desc_tag);
 
 	ioat->ring = malloc(num_descriptors * sizeof(*ring), M_IOAT,
 	    M_ZERO | M_WAITOK);
 
 	ring = ioat->ring;
 	for (i = 0; i < num_descriptors; i++) {
 		ring[i] = ioat_alloc_ring_entry(ioat, M_WAITOK);
 		if (ring[i] == NULL)
 			return (ENOMEM);
 
 		ring[i]->id = i;
 	}
 
 	for (i = 0; i < num_descriptors - 1; i++) {
 		next = ring[i + 1];
 		dma_hw_desc = ring[i]->u.dma;
 
 		dma_hw_desc->next = next->hw_desc_bus_addr;
 	}
 
 	ring[i]->u.dma->next = ring[0]->hw_desc_bus_addr;
 
 	ioat->head = ioat->hw_head = 0;
 	ioat->tail = 0;
 	ioat->last_seen = 0;
 	*ioat->comp_update = 0;
 	return (0);
 }
 
 static int
 ioat_map_pci_bar(struct ioat_softc *ioat)
 {
 
 	ioat->pci_resource_id = PCIR_BAR(0);
 	ioat->pci_resource = bus_alloc_resource_any(ioat->device,
 	    SYS_RES_MEMORY, &ioat->pci_resource_id, RF_ACTIVE);
 
 	if (ioat->pci_resource == NULL) {
 		ioat_log_message(0, "unable to allocate pci resource\n");
 		return (ENODEV);
 	}
 
 	ioat->pci_bus_tag = rman_get_bustag(ioat->pci_resource);
 	ioat->pci_bus_handle = rman_get_bushandle(ioat->pci_resource);
 	return (0);
 }
 
 static void
 ioat_comp_update_map(void *arg, bus_dma_segment_t *seg, int nseg, int error)
 {
 	struct ioat_softc *ioat = arg;
 
 	KASSERT(error == 0, ("%s: error:%d", __func__, error));
 	ioat->comp_update_bus_addr = seg[0].ds_addr;
 }
 
 static void
 ioat_dmamap_cb(void *arg, bus_dma_segment_t *segs, int nseg, int error)
 {
 	bus_addr_t *baddr;
 
 	KASSERT(error == 0, ("%s: error:%d", __func__, error));
 	baddr = arg;
 	*baddr = segs->ds_addr;
 }
 
 /*
  * Interrupt setup and handlers
  */
 static int
 ioat_setup_intr(struct ioat_softc *ioat)
 {
 	uint32_t num_vectors;
 	int error;
 	boolean_t use_msix;
 	boolean_t force_legacy_interrupts;
 
 	use_msix = FALSE;
 	force_legacy_interrupts = FALSE;
 
 	if (!g_force_legacy_interrupts && pci_msix_count(ioat->device) >= 1) {
 		num_vectors = 1;
 		pci_alloc_msix(ioat->device, &num_vectors);
 		if (num_vectors == 1)
 			use_msix = TRUE;
 	}
 
 	if (use_msix) {
 		ioat->rid = 1;
 		ioat->res = bus_alloc_resource_any(ioat->device, SYS_RES_IRQ,
 		    &ioat->rid, RF_ACTIVE);
 	} else {
 		ioat->rid = 0;
 		ioat->res = bus_alloc_resource_any(ioat->device, SYS_RES_IRQ,
 		    &ioat->rid, RF_SHAREABLE | RF_ACTIVE);
 	}
 	if (ioat->res == NULL) {
 		ioat_log_message(0, "bus_alloc_resource failed\n");
 		return (ENOMEM);
 	}
 
 	ioat->tag = NULL;
 	error = bus_setup_intr(ioat->device, ioat->res, INTR_MPSAFE |
 	    INTR_TYPE_MISC, NULL, ioat_interrupt_handler, ioat, &ioat->tag);
 	if (error != 0) {
 		ioat_log_message(0, "bus_setup_intr failed\n");
 		return (error);
 	}
 
 	ioat_write_intrctrl(ioat, IOAT_INTRCTRL_MASTER_INT_EN);
 	return (0);
 }
 
 static boolean_t
 ioat_model_resets_msix(struct ioat_softc *ioat)
 {
 	u_int32_t pciid;
 
 	pciid = pci_get_devid(ioat->device);
 	switch (pciid) {
 		/* BWD: */
 	case 0x0c508086:
 	case 0x0c518086:
 	case 0x0c528086:
 	case 0x0c538086:
 		/* BDXDE: */
 	case 0x6f508086:
 	case 0x6f518086:
 	case 0x6f528086:
 	case 0x6f538086:
 		return (TRUE);
 	}
 
 	return (FALSE);
 }
 
 static void
 ioat_interrupt_handler(void *arg)
 {
 	struct ioat_softc *ioat = arg;
 
 	ioat->stats.interrupts++;
 	ioat_process_events(ioat);
 }
 
 static int
 chanerr_to_errno(uint32_t chanerr)
 {
 
 	if (chanerr == 0)
 		return (0);
 	if ((chanerr & (IOAT_CHANERR_XSADDERR | IOAT_CHANERR_XDADDERR)) != 0)
 		return (EFAULT);
 	if ((chanerr & (IOAT_CHANERR_RDERR | IOAT_CHANERR_WDERR)) != 0)
 		return (EIO);
 	/* This one is probably our fault: */
 	if ((chanerr & IOAT_CHANERR_NDADDERR) != 0)
 		return (EIO);
 	return (EIO);
 }
 
 static void
 ioat_process_events(struct ioat_softc *ioat)
 {
 	struct ioat_descriptor *desc;
 	struct bus_dmadesc *dmadesc;
 	uint64_t comp_update, status;
 	uint32_t completed, chanerr;
 	boolean_t pending;
 	int error;
 
-	CTR0(KTR_IOAT, __func__);
+	CTR2(KTR_IOAT, "%s channel=%u", __func__, ioat->chan_idx);
 
 	mtx_lock(&ioat->cleanup_lock);
 
 	/*
 	 * Don't run while the hardware is being reset.  Reset is responsible
 	 * for blocking new work and draining & completing existing work, so
 	 * there is nothing to do until new work is queued after reset anyway.
 	 */
 	if (ioat->resetting_cleanup) {
 		mtx_unlock(&ioat->cleanup_lock);
 		return;
 	}
 
 	completed = 0;
 	comp_update = *ioat->comp_update;
 	status = comp_update & IOAT_CHANSTS_COMPLETED_DESCRIPTOR_MASK;
 
 	if (status == ioat->last_seen) {
 		/*
 		 * If we landed in process_events and nothing has been
 		 * completed, check for a timeout due to channel halt.
 		 */
 		comp_update = ioat_get_chansts(ioat);
 		goto out;
 	}
 
 	while (1) {
 		desc = ioat_get_ring_entry(ioat, ioat->tail);
 		dmadesc = &desc->bus_dmadesc;
-		CTR3(KTR_IOAT, "completing desc %u ok  cb %p(%p)", ioat->tail,
-		    dmadesc->callback_fn, dmadesc->callback_arg);
+		CTR4(KTR_IOAT, "channel=%u completing desc %u ok  cb %p(%p)",
+		    ioat->chan_idx, ioat->tail, dmadesc->callback_fn,
+		    dmadesc->callback_arg);
 
 		if (dmadesc->callback_fn != NULL)
 			dmadesc->callback_fn(dmadesc->callback_arg, 0);
 
 		completed++;
 		ioat->tail++;
 		if (desc->hw_desc_bus_addr == status)
 			break;
 
 		KASSERT(ioat_get_active(ioat) > 0, ("overrunning ring t:%u "
 		    "h:%u st:0x%016lx last_seen:%016lx completed:%u\n",
 		    ioat->tail, ioat->head, comp_update, ioat->last_seen,
 		    completed));
 	}
 
 	ioat->last_seen = desc->hw_desc_bus_addr;
 	ioat->stats.descriptors_processed += completed;
 
 out:
 	ioat_write_chanctrl(ioat, IOAT_CHANCTRL_RUN);
 
 	/* Perform a racy check first; only take the locks if it passes. */
 	pending = (ioat_get_active(ioat) != 0);
 	if (!pending && ioat->is_completion_pending) {
 		mtx_unlock(&ioat->cleanup_lock);
 		mtx_lock(&ioat->submit_lock);
 		mtx_lock(&ioat->cleanup_lock);
 
 		pending = (ioat_get_active(ioat) != 0);
 		if (!pending && ioat->is_completion_pending) {
 			ioat->is_completion_pending = FALSE;
 			callout_reset(&ioat->shrink_timer, IOAT_SHRINK_PERIOD,
 			    ioat_shrink_timer_callback, ioat);
 			callout_stop(&ioat->poll_timer);
 		}
 		mtx_unlock(&ioat->submit_lock);
 	}
 	mtx_unlock(&ioat->cleanup_lock);
 
 	if (pending)
 		callout_reset(&ioat->poll_timer, 1, ioat_poll_timer_callback,
 		    ioat);
 
 	if (completed != 0) {
 		ioat_putn(ioat, completed, IOAT_ACTIVE_DESCR_REF);
 		wakeup(&ioat->tail);
 	}
 
 	if (!is_ioat_halted(comp_update) && !is_ioat_suspended(comp_update))
 		return;
 
 	ioat->stats.channel_halts++;
 
 	/*
 	 * Fatal programming error on this DMA channel.  Flush any outstanding
 	 * work with error status and restart the engine.
 	 */
 	ioat_log_message(0, "Channel halted due to fatal programming error\n");
 	mtx_lock(&ioat->submit_lock);
 	mtx_lock(&ioat->cleanup_lock);
 	ioat->quiescing = TRUE;
 
 	chanerr = ioat_read_4(ioat, IOAT_CHANERR_OFFSET);
 	ioat_halted_debug(ioat, chanerr);
 	ioat->stats.last_halt_chanerr = chanerr;
 
 	while (ioat_get_active(ioat) > 0) {
 		desc = ioat_get_ring_entry(ioat, ioat->tail);
 		dmadesc = &desc->bus_dmadesc;
-		CTR3(KTR_IOAT, "completing desc %u err cb %p(%p)", ioat->tail,
-		    dmadesc->callback_fn, dmadesc->callback_arg);
+		CTR4(KTR_IOAT, "channel=%u completing desc %u err cb %p(%p)",
+		    ioat->chan_idx, ioat->tail, dmadesc->callback_fn,
+		    dmadesc->callback_arg);
 
 		if (dmadesc->callback_fn != NULL)
 			dmadesc->callback_fn(dmadesc->callback_arg,
 			    chanerr_to_errno(chanerr));
 
 		ioat_putn_locked(ioat, 1, IOAT_ACTIVE_DESCR_REF);
 		ioat->tail++;
 		ioat->stats.descriptors_processed++;
 		ioat->stats.descriptors_error++;
 	}
 
 	/* Clear error status */
 	ioat_write_4(ioat, IOAT_CHANERR_OFFSET, chanerr);
 
 	mtx_unlock(&ioat->cleanup_lock);
 	mtx_unlock(&ioat->submit_lock);
 
 	ioat_log_message(0, "Resetting channel to recover from error\n");
 	error = taskqueue_enqueue(taskqueue_thread, &ioat->reset_task);
 	KASSERT(error == 0,
 	    ("%s: taskqueue_enqueue failed: %d", __func__, error));
 }
 
 static void
 ioat_reset_hw_task(void *ctx, int pending __unused)
 {
 	struct ioat_softc *ioat;
 	int error;
 
 	ioat = ctx;
 	ioat_log_message(1, "%s: Resetting channel\n", __func__);
 
 	error = ioat_reset_hw(ioat);
 	KASSERT(error == 0, ("%s: reset failed: %d", __func__, error));
 	(void)error;
 }
 
 /*
  * User API functions
  */
 unsigned
 ioat_get_nchannels(void)
 {
 
 	return (ioat_channel_index);
 }
 
 bus_dmaengine_t
 ioat_get_dmaengine(uint32_t index, int flags)
 {
 	struct ioat_softc *ioat;
 
 	KASSERT((flags & ~(M_NOWAIT | M_WAITOK)) == 0,
 	    ("invalid flags: 0x%08x", flags));
 	KASSERT((flags & (M_NOWAIT | M_WAITOK)) != (M_NOWAIT | M_WAITOK),
 	    ("invalid wait | nowait"));
 
 	if (index >= ioat_channel_index)
 		return (NULL);
 
 	ioat = ioat_channel[index];
 	if (ioat == NULL || ioat->destroying)
 		return (NULL);
 
 	if (ioat->quiescing) {
 		if ((flags & M_NOWAIT) != 0)
 			return (NULL);
 
 		mtx_lock(IOAT_REFLK);
 		while (ioat->quiescing && !ioat->destroying)
 			msleep(&ioat->quiescing, IOAT_REFLK, 0, "getdma", 0);
 		mtx_unlock(IOAT_REFLK);
 
 		if (ioat->destroying)
 			return (NULL);
 	}
 
 	/*
 	 * There's a race here between the quiescing check and HW reset or
 	 * module destroy.
 	 */
 	return (&ioat_get(ioat, IOAT_DMAENGINE_REF)->dmaengine);
 }
 
 void
 ioat_put_dmaengine(bus_dmaengine_t dmaengine)
 {
 	struct ioat_softc *ioat;
 
 	ioat = to_ioat_softc(dmaengine);
 	ioat_put(ioat, IOAT_DMAENGINE_REF);
 }
 
 int
 ioat_get_hwversion(bus_dmaengine_t dmaengine)
 {
 	struct ioat_softc *ioat;
 
 	ioat = to_ioat_softc(dmaengine);
 	return (ioat->version);
 }
 
 size_t
 ioat_get_max_io_size(bus_dmaengine_t dmaengine)
 {
 	struct ioat_softc *ioat;
 
 	ioat = to_ioat_softc(dmaengine);
 	return (ioat->max_xfer_size);
 }
 
 uint32_t
 ioat_get_capabilities(bus_dmaengine_t dmaengine)
 {
 	struct ioat_softc *ioat;
 
 	ioat = to_ioat_softc(dmaengine);
 	return (ioat->capabilities);
 }
 
 int
 ioat_set_interrupt_coalesce(bus_dmaengine_t dmaengine, uint16_t delay)
 {
 	struct ioat_softc *ioat;
 
 	ioat = to_ioat_softc(dmaengine);
 	if (!ioat->intrdelay_supported)
 		return (ENODEV);
 	if (delay > ioat->intrdelay_max)
 		return (ERANGE);
 
 	ioat_write_2(ioat, IOAT_INTRDELAY_OFFSET, delay);
 	ioat->cached_intrdelay =
 	    ioat_read_2(ioat, IOAT_INTRDELAY_OFFSET) & IOAT_INTRDELAY_US_MASK;
 	return (0);
 }
 
 uint16_t
 ioat_get_max_coalesce_period(bus_dmaengine_t dmaengine)
 {
 	struct ioat_softc *ioat;
 
 	ioat = to_ioat_softc(dmaengine);
 	return (ioat->intrdelay_max);
 }
 
 void
 ioat_acquire(bus_dmaengine_t dmaengine)
 {
 	struct ioat_softc *ioat;
 
 	ioat = to_ioat_softc(dmaengine);
 	mtx_lock(&ioat->submit_lock);
-	CTR0(KTR_IOAT, __func__);
+	CTR2(KTR_IOAT, "%s channel=%u", __func__, ioat->chan_idx);
 }
 
 int
 ioat_acquire_reserve(bus_dmaengine_t dmaengine, unsigned n, int mflags)
 {
 	struct ioat_softc *ioat;
 	int error;
 
 	ioat = to_ioat_softc(dmaengine);
 	ioat_acquire(dmaengine);
 
 	error = ioat_reserve_space(ioat, n, mflags);
 	if (error != 0)
 		ioat_release(dmaengine);
 	return (error);
 }
 
 void
 ioat_release(bus_dmaengine_t dmaengine)
 {
 	struct ioat_softc *ioat;
 
 	ioat = to_ioat_softc(dmaengine);
-	CTR0(KTR_IOAT, __func__);
+	CTR2(KTR_IOAT, "%s channel=%u", __func__, ioat->chan_idx);
 	ioat_write_2(ioat, IOAT_DMACOUNT_OFFSET, (uint16_t)ioat->hw_head);
 	mtx_unlock(&ioat->submit_lock);
 }
 
 static struct ioat_descriptor *
 ioat_op_generic(struct ioat_softc *ioat, uint8_t op,
     uint32_t size, uint64_t src, uint64_t dst,
     bus_dmaengine_callback_t callback_fn, void *callback_arg,
     uint32_t flags)
 {
 	struct ioat_generic_hw_descriptor *hw_desc;
 	struct ioat_descriptor *desc;
 	int mflags;
 
 	mtx_assert(&ioat->submit_lock, MA_OWNED);
 
 	KASSERT((flags & ~_DMA_GENERIC_FLAGS) == 0,
 	    ("Unrecognized flag(s): %#x", flags & ~_DMA_GENERIC_FLAGS));
 	if ((flags & DMA_NO_WAIT) != 0)
 		mflags = M_NOWAIT;
 	else
 		mflags = M_WAITOK;
 
 	if (size > ioat->max_xfer_size) {
 		ioat_log_message(0, "%s: max_xfer_size = %d, requested = %u\n",
 		    __func__, ioat->max_xfer_size, (unsigned)size);
 		return (NULL);
 	}
 
 	if (ioat_reserve_space(ioat, 1, mflags) != 0)
 		return (NULL);
 
 	desc = ioat_get_ring_entry(ioat, ioat->head);
 	hw_desc = desc->u.generic;
 
 	hw_desc->u.control_raw = 0;
 	hw_desc->u.control_generic.op = op;
 	hw_desc->u.control_generic.completion_update = 1;
 
 	if ((flags & DMA_INT_EN) != 0)
 		hw_desc->u.control_generic.int_enable = 1;
 	if ((flags & DMA_FENCE) != 0)
 		hw_desc->u.control_generic.fence = 1;
 
 	hw_desc->size = size;
 	hw_desc->src_addr = src;
 	hw_desc->dest_addr = dst;
 
 	desc->bus_dmadesc.callback_fn = callback_fn;
 	desc->bus_dmadesc.callback_arg = callback_arg;
 	return (desc);
 }
 
 struct bus_dmadesc *
 ioat_null(bus_dmaengine_t dmaengine, bus_dmaengine_callback_t callback_fn,
     void *callback_arg, uint32_t flags)
 {
 	struct ioat_dma_hw_descriptor *hw_desc;
 	struct ioat_descriptor *desc;
 	struct ioat_softc *ioat;
 
-	CTR0(KTR_IOAT, __func__);
 	ioat = to_ioat_softc(dmaengine);
+	CTR2(KTR_IOAT, "%s channel=%u", __func__, ioat->chan_idx);
 
 	desc = ioat_op_generic(ioat, IOAT_OP_COPY, 8, 0, 0, callback_fn,
 	    callback_arg, flags);
 	if (desc == NULL)
 		return (NULL);
 
 	hw_desc = desc->u.dma;
 	hw_desc->u.control.null = 1;
 	ioat_submit_single(ioat);
 	return (&desc->bus_dmadesc);
 }
 
 struct bus_dmadesc *
 ioat_copy(bus_dmaengine_t dmaengine, bus_addr_t dst,
     bus_addr_t src, bus_size_t len, bus_dmaengine_callback_t callback_fn,
     void *callback_arg, uint32_t flags)
 {
 	struct ioat_dma_hw_descriptor *hw_desc;
 	struct ioat_descriptor *desc;
 	struct ioat_softc *ioat;
 
-	CTR0(KTR_IOAT, __func__);
 	ioat = to_ioat_softc(dmaengine);
+	CTR2(KTR_IOAT, "%s channel=%u", __func__, ioat->chan_idx);
 
 	if (((src | dst) & (0xffffull << 48)) != 0) {
 		ioat_log_message(0, "%s: High 16 bits of src/dst invalid\n",
 		    __func__);
 		return (NULL);
 	}
 
 	desc = ioat_op_generic(ioat, IOAT_OP_COPY, len, src, dst, callback_fn,
 	    callback_arg, flags);
 	if (desc == NULL)
 		return (NULL);
 
 	hw_desc = desc->u.dma;
 	if (g_ioat_debug_level >= 3)
 		dump_descriptor(hw_desc);
 
 	ioat_submit_single(ioat);
 	return (&desc->bus_dmadesc);
 }
 
 struct bus_dmadesc *
 ioat_copy_8k_aligned(bus_dmaengine_t dmaengine, bus_addr_t dst1,
     bus_addr_t dst2, bus_addr_t src1, bus_addr_t src2,
     bus_dmaengine_callback_t callback_fn, void *callback_arg, uint32_t flags)
 {
 	struct ioat_dma_hw_descriptor *hw_desc;
 	struct ioat_descriptor *desc;
 	struct ioat_softc *ioat;
 
-	CTR0(KTR_IOAT, __func__);
 	ioat = to_ioat_softc(dmaengine);
+	CTR2(KTR_IOAT, "%s channel=%u", __func__, ioat->chan_idx);
 
 	if (((src1 | src2 | dst1 | dst2) & (0xffffull << 48)) != 0) {
 		ioat_log_message(0, "%s: High 16 bits of src/dst invalid\n",
 		    __func__);
 		return (NULL);
 	}
 	if (((src1 | src2 | dst1 | dst2) & PAGE_MASK) != 0) {
 		ioat_log_message(0, "%s: Addresses must be page-aligned\n",
 		    __func__);
 		return (NULL);
 	}
 
 	desc = ioat_op_generic(ioat, IOAT_OP_COPY, 2 * PAGE_SIZE, src1, dst1,
 	    callback_fn, callback_arg, flags);
 	if (desc == NULL)
 		return (NULL);
 
 	hw_desc = desc->u.dma;
 	if (src2 != src1 + PAGE_SIZE) {
 		hw_desc->u.control.src_page_break = 1;
 		hw_desc->next_src_addr = src2;
 	}
 	if (dst2 != dst1 + PAGE_SIZE) {
 		hw_desc->u.control.dest_page_break = 1;
 		hw_desc->next_dest_addr = dst2;
 	}
 
 	if (g_ioat_debug_level >= 3)
 		dump_descriptor(hw_desc);
 
 	ioat_submit_single(ioat);
 	return (&desc->bus_dmadesc);
 }
 
 struct bus_dmadesc *
 ioat_copy_crc(bus_dmaengine_t dmaengine, bus_addr_t dst, bus_addr_t src,
     bus_size_t len, uint32_t *initialseed, bus_addr_t crcptr,
     bus_dmaengine_callback_t callback_fn, void *callback_arg, uint32_t flags)
 {
 	struct ioat_crc32_hw_descriptor *hw_desc;
 	struct ioat_descriptor *desc;
 	struct ioat_softc *ioat;
 	uint32_t teststore;
 	uint8_t op;
 
-	CTR0(KTR_IOAT, __func__);
 	ioat = to_ioat_softc(dmaengine);
+	CTR2(KTR_IOAT, "%s channel=%u", __func__, ioat->chan_idx);
 
 	if ((ioat->capabilities & IOAT_DMACAP_MOVECRC) == 0) {
 		ioat_log_message(0, "%s: Device lacks MOVECRC capability\n",
 		    __func__);
 		return (NULL);
 	}
 	if (((src | dst) & (0xffffffull << 40)) != 0) {
 		ioat_log_message(0, "%s: High 24 bits of src/dst invalid\n",
 		    __func__);
 		return (NULL);
 	}
 	teststore = (flags & _DMA_CRC_TESTSTORE);
 	if (teststore == _DMA_CRC_TESTSTORE) {
 		ioat_log_message(0, "%s: TEST and STORE invalid\n", __func__);
 		return (NULL);
 	}
 	if (teststore == 0 && (flags & DMA_CRC_INLINE) != 0) {
 		ioat_log_message(0, "%s: INLINE invalid without TEST or STORE\n",
 		    __func__);
 		return (NULL);
 	}
 
 	switch (teststore) {
 	case DMA_CRC_STORE:
 		op = IOAT_OP_MOVECRC_STORE;
 		break;
 	case DMA_CRC_TEST:
 		op = IOAT_OP_MOVECRC_TEST;
 		break;
 	default:
 		KASSERT(teststore == 0, ("bogus"));
 		op = IOAT_OP_MOVECRC;
 		break;
 	}
 
 	if ((flags & DMA_CRC_INLINE) == 0 &&
 	    (crcptr & (0xffffffull << 40)) != 0) {
 		ioat_log_message(0,
 		    "%s: High 24 bits of crcptr invalid\n", __func__);
 		return (NULL);
 	}
 
 	desc = ioat_op_generic(ioat, op, len, src, dst, callback_fn,
 	    callback_arg, flags & ~_DMA_CRC_FLAGS);
 	if (desc == NULL)
 		return (NULL);
 
 	hw_desc = desc->u.crc32;
 
 	if ((flags & DMA_CRC_INLINE) == 0)
 		hw_desc->crc_address = crcptr;
 	else
 		hw_desc->u.control.crc_location = 1;
 
 	if (initialseed != NULL) {
 		hw_desc->u.control.use_seed = 1;
 		hw_desc->seed = *initialseed;
 	}
 
 	if (g_ioat_debug_level >= 3)
 		dump_descriptor(hw_desc);
 
 	ioat_submit_single(ioat);
 	return (&desc->bus_dmadesc);
 }
 
 struct bus_dmadesc *
 ioat_crc(bus_dmaengine_t dmaengine, bus_addr_t src, bus_size_t len,
     uint32_t *initialseed, bus_addr_t crcptr,
     bus_dmaengine_callback_t callback_fn, void *callback_arg, uint32_t flags)
 {
 	struct ioat_crc32_hw_descriptor *hw_desc;
 	struct ioat_descriptor *desc;
 	struct ioat_softc *ioat;
 	uint32_t teststore;
 	uint8_t op;
 
-	CTR0(KTR_IOAT, __func__);
 	ioat = to_ioat_softc(dmaengine);
+	CTR2(KTR_IOAT, "%s channel=%u", __func__, ioat->chan_idx);
 
 	if ((ioat->capabilities & IOAT_DMACAP_CRC) == 0) {
 		ioat_log_message(0, "%s: Device lacks CRC capability\n",
 		    __func__);
 		return (NULL);
 	}
 	if ((src & (0xffffffull << 40)) != 0) {
 		ioat_log_message(0, "%s: High 24 bits of src invalid\n",
 		    __func__);
 		return (NULL);
 	}
 	teststore = (flags & _DMA_CRC_TESTSTORE);
 	if (teststore == _DMA_CRC_TESTSTORE) {
 		ioat_log_message(0, "%s: TEST and STORE invalid\n", __func__);
 		return (NULL);
 	}
 	if (teststore == 0 && (flags & DMA_CRC_INLINE) != 0) {
 		ioat_log_message(0, "%s: INLINE invalid without TEST or STORE\n",
 		    __func__);
 		return (NULL);
 	}
 
 	switch (teststore) {
 	case DMA_CRC_STORE:
 		op = IOAT_OP_CRC_STORE;
 		break;
 	case DMA_CRC_TEST:
 		op = IOAT_OP_CRC_TEST;
 		break;
 	default:
 		KASSERT(teststore == 0, ("bogus"));
 		op = IOAT_OP_CRC;
 		break;
 	}
 
 	if ((flags & DMA_CRC_INLINE) == 0 &&
 	    (crcptr & (0xffffffull << 40)) != 0) {
 		ioat_log_message(0,
 		    "%s: High 24 bits of crcptr invalid\n", __func__);
 		return (NULL);
 	}
 
 	desc = ioat_op_generic(ioat, op, len, src, 0, callback_fn,
 	    callback_arg, flags & ~_DMA_CRC_FLAGS);
 	if (desc == NULL)
 		return (NULL);
 
 	hw_desc = desc->u.crc32;
 
 	if ((flags & DMA_CRC_INLINE) == 0)
 		hw_desc->crc_address = crcptr;
 	else
 		hw_desc->u.control.crc_location = 1;
 
 	if (initialseed != NULL) {
 		hw_desc->u.control.use_seed = 1;
 		hw_desc->seed = *initialseed;
 	}
 
 	if (g_ioat_debug_level >= 3)
 		dump_descriptor(hw_desc);
 
 	ioat_submit_single(ioat);
 	return (&desc->bus_dmadesc);
 }
 
 struct bus_dmadesc *
 ioat_blockfill(bus_dmaengine_t dmaengine, bus_addr_t dst, uint64_t fillpattern,
     bus_size_t len, bus_dmaengine_callback_t callback_fn, void *callback_arg,
     uint32_t flags)
 {
 	struct ioat_fill_hw_descriptor *hw_desc;
 	struct ioat_descriptor *desc;
 	struct ioat_softc *ioat;
 
-	CTR0(KTR_IOAT, __func__);
 	ioat = to_ioat_softc(dmaengine);
+	CTR2(KTR_IOAT, "%s channel=%u", __func__, ioat->chan_idx);
 
 	if ((ioat->capabilities & IOAT_DMACAP_BFILL) == 0) {
 		ioat_log_message(0, "%s: Device lacks BFILL capability\n",
 		    __func__);
 		return (NULL);
 	}
 
 	if ((dst & (0xffffull << 48)) != 0) {
 		ioat_log_message(0, "%s: High 16 bits of dst invalid\n",
 		    __func__);
 		return (NULL);
 	}
 
 	desc = ioat_op_generic(ioat, IOAT_OP_FILL, len, fillpattern, dst,
 	    callback_fn, callback_arg, flags);
 	if (desc == NULL)
 		return (NULL);
 
 	hw_desc = desc->u.fill;
 	if (g_ioat_debug_level >= 3)
 		dump_descriptor(hw_desc);
 
 	ioat_submit_single(ioat);
 	return (&desc->bus_dmadesc);
 }
 
 /*
  * Ring Management
  */
 static inline uint32_t
 ioat_get_active(struct ioat_softc *ioat)
 {
 
 	return ((ioat->head - ioat->tail) & ((1 << ioat->ring_size_order) - 1));
 }
 
 static inline uint32_t
 ioat_get_ring_space(struct ioat_softc *ioat)
 {
 
 	return ((1 << ioat->ring_size_order) - ioat_get_active(ioat) - 1);
 }
 
 static struct ioat_descriptor *
 ioat_alloc_ring_entry(struct ioat_softc *ioat, int mflags)
 {
 	struct ioat_generic_hw_descriptor *hw_desc;
 	struct ioat_descriptor *desc;
 	int error, busdmaflag;
 
 	error = ENOMEM;
 	hw_desc = NULL;
 
 	if ((mflags & M_WAITOK) != 0)
 		busdmaflag = BUS_DMA_WAITOK;
 	else
 		busdmaflag = BUS_DMA_NOWAIT;
 
 	desc = malloc(sizeof(*desc), M_IOAT, mflags);
 	if (desc == NULL)
 		goto out;
 
 	bus_dmamem_alloc(ioat->hw_desc_tag, (void **)&hw_desc,
 	    BUS_DMA_ZERO | busdmaflag, &ioat->hw_desc_map);
 	if (hw_desc == NULL)
 		goto out;
 
 	memset(&desc->bus_dmadesc, 0, sizeof(desc->bus_dmadesc));
 	desc->u.generic = hw_desc;
 
 	error = bus_dmamap_load(ioat->hw_desc_tag, ioat->hw_desc_map, hw_desc,
 	    sizeof(*hw_desc), ioat_dmamap_cb, &desc->hw_desc_bus_addr,
 	    busdmaflag);
 	if (error)
 		goto out;
 
 out:
 	if (error) {
 		ioat_free_ring_entry(ioat, desc);
 		return (NULL);
 	}
 	return (desc);
 }
 
 static void
 ioat_free_ring_entry(struct ioat_softc *ioat, struct ioat_descriptor *desc)
 {
 
 	if (desc == NULL)
 		return;
 
 	if (desc->u.generic)
 		bus_dmamem_free(ioat->hw_desc_tag, desc->u.generic,
 		    ioat->hw_desc_map);
 	free(desc, M_IOAT);
 }
 
 /*
  * Reserves space in this IOAT descriptor ring by ensuring enough slots remain
  * for 'num_descs'.
  *
  * If mflags contains M_WAITOK, blocks until enough space is available.
  *
  * Returns zero on success, or an errno on error.  If num_descs is beyond the
  * maximum ring size, returns EINVAl; if allocation would block and mflags
  * contains M_NOWAIT, returns EAGAIN.
  *
  * Must be called with the submit_lock held; returns with the lock held.  The
  * lock may be dropped to allocate the ring.
  *
  * (The submit_lock is needed to add any entries to the ring, so callers are
  * assured enough room is available.)
  */
 static int
 ioat_reserve_space(struct ioat_softc *ioat, uint32_t num_descs, int mflags)
 {
 	struct ioat_descriptor **new_ring;
 	uint32_t order;
 	boolean_t dug;
 	int error;
 
 	mtx_assert(&ioat->submit_lock, MA_OWNED);
 	error = 0;
 	dug = FALSE;
 
 	if (num_descs < 1 || num_descs >= (1 << IOAT_MAX_ORDER)) {
 		error = EINVAL;
 		goto out;
 	}
 
 	for (;;) {
 		if (ioat->quiescing) {
 			error = ENXIO;
 			goto out;
 		}
 
 		if (ioat_get_ring_space(ioat) >= num_descs)
 			goto out;
 
 		if (!dug && !ioat->is_submitter_processing &&
 		    (1 << ioat->ring_size_order) > num_descs) {
 			ioat->is_submitter_processing = TRUE;
 			mtx_unlock(&ioat->submit_lock);
 
 			ioat_process_events(ioat);
 
 			mtx_lock(&ioat->submit_lock);
 			dug = TRUE;
 			KASSERT(ioat->is_submitter_processing == TRUE,
 			    ("is_submitter_processing"));
 			ioat->is_submitter_processing = FALSE;
 			wakeup(&ioat->tail);
 			continue;
 		}
 
 		order = ioat->ring_size_order;
 		if (ioat->is_resize_pending || order == IOAT_MAX_ORDER) {
 			if ((mflags & M_WAITOK) != 0) {
 				msleep(&ioat->tail, &ioat->submit_lock, 0,
 				    "ioat_rsz", 0);
 				continue;
 			}
 
 			error = EAGAIN;
 			break;
 		}
 
 		ioat->is_resize_pending = TRUE;
 		for (;;) {
 			mtx_unlock(&ioat->submit_lock);
 
 			new_ring = ioat_prealloc_ring(ioat, 1 << (order + 1),
 			    TRUE, mflags);
 
 			mtx_lock(&ioat->submit_lock);
 			KASSERT(ioat->ring_size_order == order,
 			    ("is_resize_pending should protect order"));
 
 			if (new_ring == NULL) {
 				KASSERT((mflags & M_WAITOK) == 0,
 				    ("allocation failed"));
 				error = EAGAIN;
 				break;
 			}
 
 			error = ring_grow(ioat, order, new_ring);
 			if (error == 0)
 				break;
 		}
 		ioat->is_resize_pending = FALSE;
 		wakeup(&ioat->tail);
 		if (error)
 			break;
 	}
 
 out:
 	mtx_assert(&ioat->submit_lock, MA_OWNED);
 	KASSERT(!ioat->quiescing || error == ENXIO,
 	    ("reserved during quiesce"));
 	return (error);
 }
 
 static struct ioat_descriptor **
 ioat_prealloc_ring(struct ioat_softc *ioat, uint32_t size, boolean_t need_dscr,
     int mflags)
 {
 	struct ioat_descriptor **ring;
 	uint32_t i;
 	int error;
 
 	KASSERT(size > 0 && powerof2(size), ("bogus size"));
 
 	ring = malloc(size * sizeof(*ring), M_IOAT, M_ZERO | mflags);
 	if (ring == NULL)
 		return (NULL);
 
 	if (need_dscr) {
 		error = ENOMEM;
 		for (i = size / 2; i < size; i++) {
 			ring[i] = ioat_alloc_ring_entry(ioat, mflags);
 			if (ring[i] == NULL)
 				goto out;
 			ring[i]->id = i;
 		}
 	}
 	error = 0;
 
 out:
 	if (error != 0 && ring != NULL) {
 		ioat_free_ring(ioat, size, ring);
 		ring = NULL;
 	}
 	return (ring);
 }
 
 static void
 ioat_free_ring(struct ioat_softc *ioat, uint32_t size,
     struct ioat_descriptor **ring)
 {
 	uint32_t i;
 
 	for (i = 0; i < size; i++) {
 		if (ring[i] != NULL)
 			ioat_free_ring_entry(ioat, ring[i]);
 	}
 	free(ring, M_IOAT);
 }
 
 static struct ioat_descriptor *
 ioat_get_ring_entry(struct ioat_softc *ioat, uint32_t index)
 {
 
 	return (ioat->ring[index % (1 << ioat->ring_size_order)]);
 }
 
 static int
 ring_grow(struct ioat_softc *ioat, uint32_t oldorder,
     struct ioat_descriptor **newring)
 {
 	struct ioat_descriptor *tmp, *next;
 	struct ioat_dma_hw_descriptor *hw;
 	uint32_t oldsize, newsize, head, tail, i, end;
 	int error;
 
-	CTR0(KTR_IOAT, __func__);
+	CTR2(KTR_IOAT, "%s channel=%u", __func__, ioat->chan_idx);
 
 	mtx_assert(&ioat->submit_lock, MA_OWNED);
 
 	if (oldorder != ioat->ring_size_order || oldorder >= IOAT_MAX_ORDER) {
 		error = EINVAL;
 		goto out;
 	}
 
 	oldsize = (1 << oldorder);
 	newsize = (1 << (oldorder + 1));
 
 	mtx_lock(&ioat->cleanup_lock);
 
 	head = ioat->head & (oldsize - 1);
 	tail = ioat->tail & (oldsize - 1);
 
 	/* Copy old descriptors to new ring */
 	for (i = 0; i < oldsize; i++)
 		newring[i] = ioat->ring[i];
 
 	/*
 	 * If head has wrapped but tail hasn't, we must swap some descriptors
 	 * around so that tail can increment directly to head.
 	 */
 	if (head < tail) {
 		for (i = 0; i <= head; i++) {
 			tmp = newring[oldsize + i];
 
 			newring[oldsize + i] = newring[i];
 			newring[oldsize + i]->id = oldsize + i;
 
 			newring[i] = tmp;
 			newring[i]->id = i;
 		}
 		head += oldsize;
 	}
 
 	KASSERT(head >= tail, ("invariants"));
 
 	/* Head didn't wrap; we only need to link in oldsize..newsize */
 	if (head < oldsize) {
 		i = oldsize - 1;
 		end = newsize;
 	} else {
 		/* Head did wrap; link newhead..newsize and 0..oldhead */
 		i = head;
 		end = newsize + (head - oldsize) + 1;
 	}
 
 	/*
 	 * Fix up hardware ring, being careful not to trample the active
 	 * section (tail -> head).
 	 */
 	for (; i < end; i++) {
 		KASSERT((i & (newsize - 1)) < tail ||
 		    (i & (newsize - 1)) >= head, ("trampling snake"));
 
 		next = newring[(i + 1) & (newsize - 1)];
 		hw = newring[i & (newsize - 1)]->u.dma;
 		hw->next = next->hw_desc_bus_addr;
 	}
 
 #ifdef INVARIANTS
 	for (i = 0; i < newsize; i++) {
 		next = newring[(i + 1) & (newsize - 1)];
 		hw = newring[i & (newsize - 1)]->u.dma;
 
 		KASSERT(hw->next == next->hw_desc_bus_addr,
 		    ("mismatch at i:%u (oldsize:%u); next=%p nextaddr=0x%lx"
 		     " (tail:%u)", i, oldsize, next, next->hw_desc_bus_addr,
 		     tail));
 	}
 #endif
 
 	free(ioat->ring, M_IOAT);
 	ioat->ring = newring;
 	ioat->ring_size_order = oldorder + 1;
 	ioat->tail = tail;
 	ioat->head = head;
 	error = 0;
 
 	mtx_unlock(&ioat->cleanup_lock);
 out:
 	if (error)
 		ioat_free_ring(ioat, (1 << (oldorder + 1)), newring);
 	return (error);
 }
 
 static int
 ring_shrink(struct ioat_softc *ioat, uint32_t oldorder,
     struct ioat_descriptor **newring)
 {
 	struct ioat_dma_hw_descriptor *hw;
 	struct ioat_descriptor *ent, *next;
 	uint32_t oldsize, newsize, current_idx, new_idx, i;
 	int error;
 
-	CTR0(KTR_IOAT, __func__);
+	CTR2(KTR_IOAT, "%s channel=%u", __func__, ioat->chan_idx);
 
 	mtx_assert(&ioat->submit_lock, MA_OWNED);
 
 	if (oldorder != ioat->ring_size_order || oldorder <= IOAT_MIN_ORDER) {
 		error = EINVAL;
 		goto out_unlocked;
 	}
 
 	oldsize = (1 << oldorder);
 	newsize = (1 << (oldorder - 1));
 
 	mtx_lock(&ioat->cleanup_lock);
 
 	/* Can't shrink below current active set! */
 	if (ioat_get_active(ioat) >= newsize) {
 		error = ENOMEM;
 		goto out;
 	}
 
 	/*
 	 * Copy current descriptors to the new ring, dropping the removed
 	 * descriptors.
 	 */
 	for (i = 0; i < newsize; i++) {
 		current_idx = (ioat->tail + i) & (oldsize - 1);
 		new_idx = (ioat->tail + i) & (newsize - 1);
 
 		newring[new_idx] = ioat->ring[current_idx];
 		newring[new_idx]->id = new_idx;
 	}
 
 	/* Free deleted descriptors */
 	for (i = newsize; i < oldsize; i++) {
 		ent = ioat_get_ring_entry(ioat, ioat->tail + i);
 		ioat_free_ring_entry(ioat, ent);
 	}
 
 	/* Fix up hardware ring. */
 	hw = newring[(ioat->tail + newsize - 1) & (newsize - 1)]->u.dma;
 	next = newring[(ioat->tail + newsize) & (newsize - 1)];
 	hw->next = next->hw_desc_bus_addr;
 
 #ifdef INVARIANTS
 	for (i = 0; i < newsize; i++) {
 		next = newring[(i + 1) & (newsize - 1)];
 		hw = newring[i & (newsize - 1)]->u.dma;
 
 		KASSERT(hw->next == next->hw_desc_bus_addr,
 		    ("mismatch at i:%u (newsize:%u); next=%p nextaddr=0x%lx "
 		     "(tail:%u)", i, newsize, next, next->hw_desc_bus_addr,
 		     ioat->tail));
 	}
 #endif
 
 	free(ioat->ring, M_IOAT);
 	ioat->ring = newring;
 	ioat->ring_size_order = oldorder - 1;
 	error = 0;
 
 out:
 	mtx_unlock(&ioat->cleanup_lock);
 out_unlocked:
 	if (error)
 		ioat_free_ring(ioat, (1 << (oldorder - 1)), newring);
 	return (error);
 }
 
 static void
 ioat_halted_debug(struct ioat_softc *ioat, uint32_t chanerr)
 {
 	struct ioat_descriptor *desc;
 
 	ioat_log_message(0, "Channel halted (%b)\n", (int)chanerr,
 	    IOAT_CHANERR_STR);
 	if (chanerr == 0)
 		return;
 
 	mtx_assert(&ioat->cleanup_lock, MA_OWNED);
 
 	desc = ioat_get_ring_entry(ioat, ioat->tail + 0);
 	dump_descriptor(desc->u.raw);
 
 	desc = ioat_get_ring_entry(ioat, ioat->tail + 1);
 	dump_descriptor(desc->u.raw);
 }
 
 static void
 ioat_poll_timer_callback(void *arg)
 {
 	struct ioat_softc *ioat;
 
 	ioat = arg;
 	ioat_log_message(3, "%s\n", __func__);
 
 	ioat_process_events(ioat);
 }
 
 static void
 ioat_shrink_timer_callback(void *arg)
 {
 	struct ioat_descriptor **newring;
 	struct ioat_softc *ioat;
 	uint32_t order;
 
 	ioat = arg;
 	ioat_log_message(1, "%s\n", __func__);
 
 	/* Slowly scale the ring down if idle. */
 	mtx_lock(&ioat->submit_lock);
 
 	/* Don't run while the hardware is being reset. */
 	if (ioat->resetting) {
 		mtx_unlock(&ioat->submit_lock);
 		return;
 	}
 
 	order = ioat->ring_size_order;
 	if (ioat->is_completion_pending || ioat->is_resize_pending ||
 	    order == IOAT_MIN_ORDER) {
 		mtx_unlock(&ioat->submit_lock);
 		goto out;
 	}
 	ioat->is_resize_pending = TRUE;
 	mtx_unlock(&ioat->submit_lock);
 
 	newring = ioat_prealloc_ring(ioat, 1 << (order - 1), FALSE,
 	    M_NOWAIT);
 
 	mtx_lock(&ioat->submit_lock);
 	KASSERT(ioat->ring_size_order == order,
 	    ("resize_pending protects order"));
 
 	if (newring != NULL && !ioat->is_completion_pending)
 		ring_shrink(ioat, order, newring);
 	else if (newring != NULL)
 		ioat_free_ring(ioat, (1 << (order - 1)), newring);
 
 	ioat->is_resize_pending = FALSE;
 	mtx_unlock(&ioat->submit_lock);
 
 out:
 	if (ioat->ring_size_order > IOAT_MIN_ORDER)
 		callout_reset(&ioat->shrink_timer, IOAT_SHRINK_PERIOD,
 		    ioat_shrink_timer_callback, ioat);
 }
 
 /*
  * Support Functions
  */
 static void
 ioat_submit_single(struct ioat_softc *ioat)
 {
 
 	ioat_get(ioat, IOAT_ACTIVE_DESCR_REF);
 	atomic_add_rel_int(&ioat->head, 1);
 	atomic_add_rel_int(&ioat->hw_head, 1);
 
 	if (!ioat->is_completion_pending) {
 		ioat->is_completion_pending = TRUE;
 		callout_reset(&ioat->poll_timer, 1, ioat_poll_timer_callback,
 		    ioat);
 		callout_stop(&ioat->shrink_timer);
 	}
 
 	ioat->stats.descriptors_submitted++;
 }
 
 static int
 ioat_reset_hw(struct ioat_softc *ioat)
 {
 	uint64_t status;
 	uint32_t chanerr;
 	unsigned timeout;
 	int error;
 
-	CTR0(KTR_IOAT, __func__);
+	CTR2(KTR_IOAT, "%s channel=%u", __func__, ioat->chan_idx);
 
 	mtx_lock(IOAT_REFLK);
 	while (ioat->resetting && !ioat->destroying)
 		msleep(&ioat->resetting, IOAT_REFLK, 0, "IRH_drain", 0);
 	if (ioat->destroying) {
 		mtx_unlock(IOAT_REFLK);
 		return (ENXIO);
 	}
 	ioat->resetting = TRUE;
 
 	ioat->quiescing = TRUE;
 	ioat_drain_locked(ioat);
 	mtx_unlock(IOAT_REFLK);
 
 	/*
 	 * Suspend ioat_process_events while the hardware and softc are in an
 	 * indeterminate state.
 	 */
 	mtx_lock(&ioat->cleanup_lock);
 	ioat->resetting_cleanup = TRUE;
 	mtx_unlock(&ioat->cleanup_lock);
 
 	status = ioat_get_chansts(ioat);
 	if (is_ioat_active(status) || is_ioat_idle(status))
 		ioat_suspend(ioat);
 
 	/* Wait at most 20 ms */
 	for (timeout = 0; (is_ioat_active(status) || is_ioat_idle(status)) &&
 	    timeout < 20; timeout++) {
 		DELAY(1000);
 		status = ioat_get_chansts(ioat);
 	}
 	if (timeout == 20) {
 		error = ETIMEDOUT;
 		goto out;
 	}
 
 	KASSERT(ioat_get_active(ioat) == 0, ("active after quiesce"));
 
 	chanerr = ioat_read_4(ioat, IOAT_CHANERR_OFFSET);
 	ioat_write_4(ioat, IOAT_CHANERR_OFFSET, chanerr);
 
 	/*
 	 * IOAT v3 workaround - CHANERRMSK_INT with 3E07h to masks out errors
 	 *  that can cause stability issues for IOAT v3.
 	 */
 	pci_write_config(ioat->device, IOAT_CFG_CHANERRMASK_INT_OFFSET, 0x3e07,
 	    4);
 	chanerr = pci_read_config(ioat->device, IOAT_CFG_CHANERR_INT_OFFSET, 4);
 	pci_write_config(ioat->device, IOAT_CFG_CHANERR_INT_OFFSET, chanerr, 4);
 
 	/*
 	 * BDXDE and BWD models reset MSI-X registers on device reset.
 	 * Save/restore their contents manually.
 	 */
 	if (ioat_model_resets_msix(ioat)) {
 		ioat_log_message(1, "device resets MSI-X registers; saving\n");
 		pci_save_state(ioat->device);
 	}
 
 	ioat_reset(ioat);
 
 	/* Wait at most 20 ms */
 	for (timeout = 0; ioat_reset_pending(ioat) && timeout < 20; timeout++)
 		DELAY(1000);
 	if (timeout == 20) {
 		error = ETIMEDOUT;
 		goto out;
 	}
 
 	if (ioat_model_resets_msix(ioat)) {
 		ioat_log_message(1, "device resets registers; restored\n");
 		pci_restore_state(ioat->device);
 	}
 
 	/* Reset attempts to return the hardware to "halted." */
 	status = ioat_get_chansts(ioat);
 	if (is_ioat_active(status) || is_ioat_idle(status)) {
 		/* So this really shouldn't happen... */
 		ioat_log_message(0, "Device is active after a reset?\n");
 		ioat_write_chanctrl(ioat, IOAT_CHANCTRL_RUN);
 		error = 0;
 		goto out;
 	}
 
 	chanerr = ioat_read_4(ioat, IOAT_CHANERR_OFFSET);
 	if (chanerr != 0) {
 		mtx_lock(&ioat->cleanup_lock);
 		ioat_halted_debug(ioat, chanerr);
 		mtx_unlock(&ioat->cleanup_lock);
 		error = EIO;
 		goto out;
 	}
 
 	/*
 	 * Bring device back online after reset.  Writing CHAINADDR brings the
 	 * device back to active.
 	 *
 	 * The internal ring counter resets to zero, so we have to start over
 	 * at zero as well.
 	 */
 	ioat->tail = ioat->head = ioat->hw_head = 0;
 	ioat->last_seen = 0;
 	*ioat->comp_update = 0;
 
 	ioat_write_chanctrl(ioat, IOAT_CHANCTRL_RUN);
 	ioat_write_chancmp(ioat, ioat->comp_update_bus_addr);
 	ioat_write_chainaddr(ioat, ioat->ring[0]->hw_desc_bus_addr);
 	error = 0;
 
 out:
 	/*
 	 * Resume completions now that ring state is consistent.
 	 * ioat_start_channel will add a pending completion and if we are still
 	 * blocking completions, we may livelock.
 	 */
 	mtx_lock(&ioat->cleanup_lock);
 	ioat->resetting_cleanup = FALSE;
 	mtx_unlock(&ioat->cleanup_lock);
 
 	/* Enqueues a null operation and ensures it completes. */
 	if (error == 0)
 		error = ioat_start_channel(ioat);
 
 	/* Unblock submission of new work */
 	mtx_lock(IOAT_REFLK);
 	ioat->quiescing = FALSE;
 	wakeup(&ioat->quiescing);
 
 	ioat->resetting = FALSE;
 	wakeup(&ioat->resetting);
 	mtx_unlock(IOAT_REFLK);
 
 	return (error);
 }
 
 static int
 sysctl_handle_chansts(SYSCTL_HANDLER_ARGS)
 {
 	struct ioat_softc *ioat;
 	struct sbuf sb;
 	uint64_t status;
 	int error;
 
 	ioat = arg1;
 
 	status = ioat_get_chansts(ioat) & IOAT_CHANSTS_STATUS;
 
 	sbuf_new_for_sysctl(&sb, NULL, 256, req);
 	switch (status) {
 	case IOAT_CHANSTS_ACTIVE:
 		sbuf_printf(&sb, "ACTIVE");
 		break;
 	case IOAT_CHANSTS_IDLE:
 		sbuf_printf(&sb, "IDLE");
 		break;
 	case IOAT_CHANSTS_SUSPENDED:
 		sbuf_printf(&sb, "SUSPENDED");
 		break;
 	case IOAT_CHANSTS_HALTED:
 		sbuf_printf(&sb, "HALTED");
 		break;
 	case IOAT_CHANSTS_ARMED:
 		sbuf_printf(&sb, "ARMED");
 		break;
 	default:
 		sbuf_printf(&sb, "UNKNOWN");
 		break;
 	}
 	error = sbuf_finish(&sb);
 	sbuf_delete(&sb);
 
 	if (error != 0 || req->newptr == NULL)
 		return (error);
 	return (EINVAL);
 }
 
 static int
 sysctl_handle_dpi(SYSCTL_HANDLER_ARGS)
 {
 	struct ioat_softc *ioat;
 	struct sbuf sb;
 #define	PRECISION	"1"
 	const uintmax_t factor = 10;
 	uintmax_t rate;
 	int error;
 
 	ioat = arg1;
 	sbuf_new_for_sysctl(&sb, NULL, 16, req);
 
 	if (ioat->stats.interrupts == 0) {
 		sbuf_printf(&sb, "NaN");
 		goto out;
 	}
 	rate = ioat->stats.descriptors_processed * factor /
 	    ioat->stats.interrupts;
 	sbuf_printf(&sb, "%ju.%." PRECISION "ju", rate / factor,
 	    rate % factor);
 #undef	PRECISION
 out:
 	error = sbuf_finish(&sb);
 	sbuf_delete(&sb);
 	if (error != 0 || req->newptr == NULL)
 		return (error);
 	return (EINVAL);
 }
 
 static int
 sysctl_handle_reset(SYSCTL_HANDLER_ARGS)
 {
 	struct ioat_softc *ioat;
 	int error, arg;
 
 	ioat = arg1;
 
 	arg = 0;
 	error = SYSCTL_OUT(req, &arg, sizeof(arg));
 	if (error != 0 || req->newptr == NULL)
 		return (error);
 
 	error = SYSCTL_IN(req, &arg, sizeof(arg));
 	if (error != 0)
 		return (error);
 
 	if (arg != 0)
 		error = ioat_reset_hw(ioat);
 
 	return (error);
 }
 
 static void
 dump_descriptor(void *hw_desc)
 {
 	int i, j;
 
 	for (i = 0; i < 2; i++) {
 		for (j = 0; j < 8; j++)
 			printf("%08x ", ((uint32_t *)hw_desc)[i * 8 + j]);
 		printf("\n");
 	}
 }
 
 static void
 ioat_setup_sysctl(device_t device)
 {
 	struct sysctl_oid_list *par, *statpar, *state, *hammer;
 	struct sysctl_ctx_list *ctx;
 	struct sysctl_oid *tree, *tmp;
 	struct ioat_softc *ioat;
 
 	ioat = DEVICE2SOFTC(device);
 	ctx = device_get_sysctl_ctx(device);
 	tree = device_get_sysctl_tree(device);
 	par = SYSCTL_CHILDREN(tree);
 
 	SYSCTL_ADD_INT(ctx, par, OID_AUTO, "version", CTLFLAG_RD,
 	    &ioat->version, 0, "HW version (0xMM form)");
 	SYSCTL_ADD_UINT(ctx, par, OID_AUTO, "max_xfer_size", CTLFLAG_RD,
 	    &ioat->max_xfer_size, 0, "HW maximum transfer size");
 	SYSCTL_ADD_INT(ctx, par, OID_AUTO, "intrdelay_supported", CTLFLAG_RD,
 	    &ioat->intrdelay_supported, 0, "Is INTRDELAY supported");
 	SYSCTL_ADD_U16(ctx, par, OID_AUTO, "intrdelay_max", CTLFLAG_RD,
 	    &ioat->intrdelay_max, 0,
 	    "Maximum configurable INTRDELAY on this channel (microseconds)");
 
 	tmp = SYSCTL_ADD_NODE(ctx, par, OID_AUTO, "state", CTLFLAG_RD, NULL,
 	    "IOAT channel internal state");
 	state = SYSCTL_CHILDREN(tmp);
 
 	SYSCTL_ADD_UINT(ctx, state, OID_AUTO, "ring_size_order", CTLFLAG_RD,
 	    &ioat->ring_size_order, 0, "SW descriptor ring size order");
 	SYSCTL_ADD_UINT(ctx, state, OID_AUTO, "head", CTLFLAG_RD, &ioat->head,
 	    0, "SW descriptor head pointer index");
 	SYSCTL_ADD_UINT(ctx, state, OID_AUTO, "tail", CTLFLAG_RD, &ioat->tail,
 	    0, "SW descriptor tail pointer index");
 	SYSCTL_ADD_UINT(ctx, state, OID_AUTO, "hw_head", CTLFLAG_RD,
 	    &ioat->hw_head, 0, "HW DMACOUNT");
 
 	SYSCTL_ADD_UQUAD(ctx, state, OID_AUTO, "last_completion", CTLFLAG_RD,
 	    ioat->comp_update, "HW addr of last completion");
 
 	SYSCTL_ADD_INT(ctx, state, OID_AUTO, "is_resize_pending", CTLFLAG_RD,
 	    &ioat->is_resize_pending, 0, "resize pending");
 	SYSCTL_ADD_INT(ctx, state, OID_AUTO, "is_submitter_processing",
 	    CTLFLAG_RD, &ioat->is_submitter_processing, 0,
 	    "submitter processing");
 	SYSCTL_ADD_INT(ctx, state, OID_AUTO, "is_completion_pending",
 	    CTLFLAG_RD, &ioat->is_completion_pending, 0, "completion pending");
 	SYSCTL_ADD_INT(ctx, state, OID_AUTO, "is_reset_pending", CTLFLAG_RD,
 	    &ioat->is_reset_pending, 0, "reset pending");
 	SYSCTL_ADD_INT(ctx, state, OID_AUTO, "is_channel_running", CTLFLAG_RD,
 	    &ioat->is_channel_running, 0, "channel running");
 
 	SYSCTL_ADD_PROC(ctx, state, OID_AUTO, "chansts",
 	    CTLTYPE_STRING | CTLFLAG_RD, ioat, 0, sysctl_handle_chansts, "A",
 	    "String of the channel status");
 
 	SYSCTL_ADD_U16(ctx, state, OID_AUTO, "intrdelay", CTLFLAG_RD,
 	    &ioat->cached_intrdelay, 0,
 	    "Current INTRDELAY on this channel (cached, microseconds)");
 
 	tmp = SYSCTL_ADD_NODE(ctx, par, OID_AUTO, "hammer", CTLFLAG_RD, NULL,
 	    "Big hammers (mostly for testing)");
 	hammer = SYSCTL_CHILDREN(tmp);
 
 	SYSCTL_ADD_PROC(ctx, hammer, OID_AUTO, "force_hw_reset",
 	    CTLTYPE_INT | CTLFLAG_RW, ioat, 0, sysctl_handle_reset, "I",
 	    "Set to non-zero to reset the hardware");
 
 	tmp = SYSCTL_ADD_NODE(ctx, par, OID_AUTO, "stats", CTLFLAG_RD, NULL,
 	    "IOAT channel statistics");
 	statpar = SYSCTL_CHILDREN(tmp);
 
 	SYSCTL_ADD_UQUAD(ctx, statpar, OID_AUTO, "interrupts", CTLFLAG_RW,
 	    &ioat->stats.interrupts,
 	    "Number of interrupts processed on this channel");
 	SYSCTL_ADD_UQUAD(ctx, statpar, OID_AUTO, "descriptors", CTLFLAG_RW,
 	    &ioat->stats.descriptors_processed,
 	    "Number of descriptors processed on this channel");
 	SYSCTL_ADD_UQUAD(ctx, statpar, OID_AUTO, "submitted", CTLFLAG_RW,
 	    &ioat->stats.descriptors_submitted,
 	    "Number of descriptors submitted to this channel");
 	SYSCTL_ADD_UQUAD(ctx, statpar, OID_AUTO, "errored", CTLFLAG_RW,
 	    &ioat->stats.descriptors_error,
 	    "Number of descriptors failed by channel errors");
 	SYSCTL_ADD_U32(ctx, statpar, OID_AUTO, "halts", CTLFLAG_RW,
 	    &ioat->stats.channel_halts, 0,
 	    "Number of times the channel has halted");
 	SYSCTL_ADD_U32(ctx, statpar, OID_AUTO, "last_halt_chanerr", CTLFLAG_RW,
 	    &ioat->stats.last_halt_chanerr, 0,
 	    "The raw CHANERR when the channel was last halted");
 
 	SYSCTL_ADD_PROC(ctx, statpar, OID_AUTO, "desc_per_interrupt",
 	    CTLTYPE_STRING | CTLFLAG_RD, ioat, 0, sysctl_handle_dpi, "A",
 	    "Descriptors per interrupt");
 }
 
 static inline struct ioat_softc *
 ioat_get(struct ioat_softc *ioat, enum ioat_ref_kind kind)
 {
 	uint32_t old;
 
 	KASSERT(kind < IOAT_NUM_REF_KINDS, ("bogus"));
 
 	old = atomic_fetchadd_32(&ioat->refcnt, 1);
 	KASSERT(old < UINT32_MAX, ("refcnt overflow"));
 
 #ifdef INVARIANTS
 	old = atomic_fetchadd_32(&ioat->refkinds[kind], 1);
 	KASSERT(old < UINT32_MAX, ("refcnt kind overflow"));
 #endif
 
 	return (ioat);
 }
 
 static inline void
 ioat_putn(struct ioat_softc *ioat, uint32_t n, enum ioat_ref_kind kind)
 {
 
 	_ioat_putn(ioat, n, kind, FALSE);
 }
 
 static inline void
 ioat_putn_locked(struct ioat_softc *ioat, uint32_t n, enum ioat_ref_kind kind)
 {
 
 	_ioat_putn(ioat, n, kind, TRUE);
 }
 
 static inline void
 _ioat_putn(struct ioat_softc *ioat, uint32_t n, enum ioat_ref_kind kind,
     boolean_t locked)
 {
 	uint32_t old;
 
 	KASSERT(kind < IOAT_NUM_REF_KINDS, ("bogus"));
 
 	if (n == 0)
 		return;
 
 #ifdef INVARIANTS
 	old = atomic_fetchadd_32(&ioat->refkinds[kind], -n);
 	KASSERT(old >= n, ("refcnt kind underflow"));
 #endif
 
 	/* Skip acquiring the lock if resulting refcnt > 0. */
 	for (;;) {
 		old = ioat->refcnt;
 		if (old <= n)
 			break;
 		if (atomic_cmpset_32(&ioat->refcnt, old, old - n))
 			return;
 	}
 
 	if (locked)
 		mtx_assert(IOAT_REFLK, MA_OWNED);
 	else
 		mtx_lock(IOAT_REFLK);
 
 	old = atomic_fetchadd_32(&ioat->refcnt, -n);
 	KASSERT(old >= n, ("refcnt error"));
 
 	if (old == n)
 		wakeup(IOAT_REFLK);
 	if (!locked)
 		mtx_unlock(IOAT_REFLK);
 }
 
 static inline void
 ioat_put(struct ioat_softc *ioat, enum ioat_ref_kind kind)
 {
 
 	ioat_putn(ioat, 1, kind);
 }
 
 static void
 ioat_drain_locked(struct ioat_softc *ioat)
 {
 
 	mtx_assert(IOAT_REFLK, MA_OWNED);
 	while (ioat->refcnt > 0)
 		msleep(IOAT_REFLK, IOAT_REFLK, 0, "ioat_drain", 0);
 }
 
 #ifdef DDB
 #define	_db_show_lock(lo)	LOCK_CLASS(lo)->lc_ddb_show(lo)
 #define	db_show_lock(lk)	_db_show_lock(&(lk)->lock_object)
 DB_SHOW_COMMAND(ioat, db_show_ioat)
 {
 	struct ioat_softc *sc;
 	unsigned idx;
 
 	if (!have_addr)
 		goto usage;
 	idx = (unsigned)addr;
 	if (idx >= ioat_channel_index)
 		goto usage;
 
 	sc = ioat_channel[idx];
 	db_printf("ioat softc at %p\n", sc);
 	if (sc == NULL)
 		return;
 
 	db_printf(" version: %d\n", sc->version);
 	db_printf(" chan_idx: %u\n", sc->chan_idx);
 	db_printf(" submit_lock: ");
 	db_show_lock(&sc->submit_lock);
 
 	db_printf(" capabilities: %b\n", (int)sc->capabilities,
 	    IOAT_DMACAP_STR);
 	db_printf(" cached_intrdelay: %u\n", sc->cached_intrdelay);
 	db_printf(" *comp_update: 0x%jx\n", (uintmax_t)*sc->comp_update);
 
 	db_printf(" poll_timer:\n");
 	db_printf("  c_time: %ju\n", (uintmax_t)sc->poll_timer.c_time);
 	db_printf("  c_arg: %p\n", sc->poll_timer.c_arg);
 	db_printf("  c_func: %p\n", sc->poll_timer.c_func);
 	db_printf("  c_lock: %p\n", sc->poll_timer.c_lock);
 	db_printf("  c_flags: 0x%x\n", (unsigned)sc->poll_timer.c_flags);
 
 	db_printf(" shrink_timer:\n");
 	db_printf("  c_time: %ju\n", (uintmax_t)sc->shrink_timer.c_time);
 	db_printf("  c_arg: %p\n", sc->shrink_timer.c_arg);
 	db_printf("  c_func: %p\n", sc->shrink_timer.c_func);
 	db_printf("  c_lock: %p\n", sc->shrink_timer.c_lock);
 	db_printf("  c_flags: 0x%x\n", (unsigned)sc->shrink_timer.c_flags);
 
 	db_printf(" quiescing: %d\n", (int)sc->quiescing);
 	db_printf(" destroying: %d\n", (int)sc->destroying);
 	db_printf(" is_resize_pending: %d\n", (int)sc->is_resize_pending);
 	db_printf(" is_submitter_processing: %d\n",
 	    (int)sc->is_submitter_processing);
 	db_printf(" is_completion_pending: %d\n", (int)sc->is_completion_pending);
 	db_printf(" is_reset_pending: %d\n", (int)sc->is_reset_pending);
 	db_printf(" is_channel_running: %d\n", (int)sc->is_channel_running);
 	db_printf(" intrdelay_supported: %d\n", (int)sc->intrdelay_supported);
 	db_printf(" resetting: %d\n", (int)sc->resetting);
 
 	db_printf(" head: %u\n", sc->head);
 	db_printf(" tail: %u\n", sc->tail);
 	db_printf(" hw_head: %u\n", sc->hw_head);
 	db_printf(" ring_size_order: %u\n", sc->ring_size_order);
 	db_printf(" last_seen: 0x%lx\n", sc->last_seen);
 	db_printf(" ring: %p\n", sc->ring);
 
 	db_printf("  ring[%u] (tail):\n", sc->tail %
 	    (1 << sc->ring_size_order));
 	db_printf("   id: %u\n", ioat_get_ring_entry(sc, sc->tail)->id);
 	db_printf("   addr: 0x%lx\n",
 	    ioat_get_ring_entry(sc, sc->tail)->hw_desc_bus_addr);
 	db_printf("   next: 0x%lx\n",
 	    ioat_get_ring_entry(sc, sc->tail)->u.generic->next);
 
 	db_printf("  ring[%u] (head - 1):\n", (sc->head - 1) %
 	    (1 << sc->ring_size_order));
 	db_printf("   id: %u\n", ioat_get_ring_entry(sc, sc->head - 1)->id);
 	db_printf("   addr: 0x%lx\n",
 	    ioat_get_ring_entry(sc, sc->head - 1)->hw_desc_bus_addr);
 	db_printf("   next: 0x%lx\n",
 	    ioat_get_ring_entry(sc, sc->head - 1)->u.generic->next);
 
 	db_printf("  ring[%u] (head):\n", (sc->head) %
 	    (1 << sc->ring_size_order));
 	db_printf("   id: %u\n", ioat_get_ring_entry(sc, sc->head)->id);
 	db_printf("   addr: 0x%lx\n",
 	    ioat_get_ring_entry(sc, sc->head)->hw_desc_bus_addr);
 	db_printf("   next: 0x%lx\n",
 	    ioat_get_ring_entry(sc, sc->head)->u.generic->next);
 
 	for (idx = 0; idx < (1 << sc->ring_size_order); idx++)
 		if ((*sc->comp_update & IOAT_CHANSTS_COMPLETED_DESCRIPTOR_MASK)
 		    == ioat_get_ring_entry(sc, idx)->hw_desc_bus_addr)
 			db_printf("  ring[%u] == hardware tail\n", idx);
 
 	db_printf(" cleanup_lock: ");
 	db_show_lock(&sc->cleanup_lock);
 
 	db_printf(" refcnt: %u\n", sc->refcnt);
 #ifdef INVARIANTS
 	CTASSERT(IOAT_NUM_REF_KINDS == 2);
 	db_printf(" refkinds: [ENG=%u, DESCR=%u]\n", sc->refkinds[0],
 	    sc->refkinds[1]);
 #endif
 	db_printf(" stats:\n");
 	db_printf("  interrupts: %lu\n", sc->stats.interrupts);
 	db_printf("  descriptors_processed: %lu\n", sc->stats.descriptors_processed);
 	db_printf("  descriptors_error: %lu\n", sc->stats.descriptors_error);
 	db_printf("  descriptors_submitted: %lu\n", sc->stats.descriptors_submitted);
 
 	db_printf("  channel_halts: %u\n", sc->stats.channel_halts);
 	db_printf("  last_halt_chanerr: %u\n", sc->stats.last_halt_chanerr);
 
 	if (db_pager_quit)
 		return;
 
 	db_printf(" hw status:\n");
 	db_printf("  status: 0x%lx\n", ioat_get_chansts(sc));
 	db_printf("  chanctrl: 0x%x\n",
 	    (unsigned)ioat_read_2(sc, IOAT_CHANCTRL_OFFSET));
 	db_printf("  chancmd: 0x%x\n",
 	    (unsigned)ioat_read_1(sc, IOAT_CHANCMD_OFFSET));
 	db_printf("  dmacount: 0x%x\n",
 	    (unsigned)ioat_read_2(sc, IOAT_DMACOUNT_OFFSET));
 	db_printf("  chainaddr: 0x%lx\n",
 	    ioat_read_double_4(sc, IOAT_CHAINADDR_OFFSET_LOW));
 	db_printf("  chancmp: 0x%lx\n",
 	    ioat_read_double_4(sc, IOAT_CHANCMP_OFFSET_LOW));
 	db_printf("  chanerr: %b\n",
 	    (int)ioat_read_4(sc, IOAT_CHANERR_OFFSET), IOAT_CHANERR_STR);
 	return;
 usage:
 	db_printf("usage: show ioat <0-%u>\n", ioat_channel_index);
 	return;
 }
 #endif /* DDB */
Index: user/alc/PQ_LAUNDRY/sys/dev/usb/input/ukbd.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/dev/usb/input/ukbd.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/dev/usb/input/ukbd.c	(revision 303775)
@@ -1,2181 +1,2191 @@
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 
 /*-
  * Copyright (c) 1998 The NetBSD Foundation, Inc.
  * All rights reserved.
  *
  * This code is derived from software contributed to The NetBSD Foundation
  * by Lennart Augustsson (lennart@augustsson.net) at
  * Carlstedt Research & Technology.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
  * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
  * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
  * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
  * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  *
  */
 
 /*
  * HID spec: http://www.usb.org/developers/devclass_docs/HID1_11.pdf
  */
 
 #include "opt_compat.h"
 #include "opt_kbd.h"
 #include "opt_ukbd.h"
 
 #include <sys/stdint.h>
 #include <sys/stddef.h>
 #include <sys/param.h>
 #include <sys/queue.h>
 #include <sys/types.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/bus.h>
 #include <sys/module.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/condvar.h>
 #include <sys/sysctl.h>
 #include <sys/sx.h>
 #include <sys/unistd.h>
 #include <sys/callout.h>
 #include <sys/malloc.h>
 #include <sys/priv.h>
 #include <sys/proc.h>
 #include <sys/sched.h>
 #include <sys/kdb.h>
 
 #include <dev/usb/usb.h>
 #include <dev/usb/usbdi.h>
 #include <dev/usb/usbdi_util.h>
 #include <dev/usb/usbhid.h>
 
 #define	USB_DEBUG_VAR ukbd_debug
 #include <dev/usb/usb_debug.h>
 
 #include <dev/usb/quirk/usb_quirk.h>
 
 #include <sys/ioccom.h>
 #include <sys/filio.h>
 #include <sys/tty.h>
 #include <sys/kbio.h>
 
 #include <dev/kbd/kbdreg.h>
 
 /* the initial key map, accent map and fkey strings */
 #if defined(UKBD_DFLT_KEYMAP) && !defined(KLD_MODULE)
 #define	KBD_DFLT_KEYMAP
 #include "ukbdmap.h"
 #endif
 
 /* the following file must be included after "ukbdmap.h" */
 #include <dev/kbd/kbdtables.h>
 
 #ifdef USB_DEBUG
 static int ukbd_debug = 0;
 static int ukbd_no_leds = 0;
 static int ukbd_pollrate = 0;
 
 static SYSCTL_NODE(_hw_usb, OID_AUTO, ukbd, CTLFLAG_RW, 0, "USB keyboard");
 SYSCTL_INT(_hw_usb_ukbd, OID_AUTO, debug, CTLFLAG_RWTUN,
     &ukbd_debug, 0, "Debug level");
 SYSCTL_INT(_hw_usb_ukbd, OID_AUTO, no_leds, CTLFLAG_RWTUN,
     &ukbd_no_leds, 0, "Disables setting of keyboard leds");
 SYSCTL_INT(_hw_usb_ukbd, OID_AUTO, pollrate, CTLFLAG_RWTUN,
     &ukbd_pollrate, 0, "Force this polling rate, 1-1000Hz");
 #endif
 
 #define	UKBD_EMULATE_ATSCANCODE	       1
 #define	UKBD_DRIVER_NAME          "ukbd"
 #define	UKBD_NMOD                     8	/* units */
 #define	UKBD_NKEYCODE                 6	/* units */
 #define	UKBD_IN_BUF_SIZE  (2*(UKBD_NMOD + (2*UKBD_NKEYCODE)))	/* bytes */
 #define	UKBD_IN_BUF_FULL  (UKBD_IN_BUF_SIZE / 2)	/* bytes */
 #define	UKBD_NFKEY        (sizeof(fkey_tab)/sizeof(fkey_tab[0]))	/* units */
 #define	UKBD_BUFFER_SIZE	      64	/* bytes */
 
 struct ukbd_data {
 	uint16_t	modifiers;
 #define	MOD_CONTROL_L	0x01
 #define	MOD_CONTROL_R	0x10
 #define	MOD_SHIFT_L	0x02
 #define	MOD_SHIFT_R	0x20
 #define	MOD_ALT_L	0x04
 #define	MOD_ALT_R	0x40
 #define	MOD_WIN_L	0x08
 #define	MOD_WIN_R	0x80
 /* internal */
 #define	MOD_EJECT	0x0100
 #define	MOD_FN		0x0200
 	uint8_t	keycode[UKBD_NKEYCODE];
 };
 
 enum {
 	UKBD_INTR_DT,
 	UKBD_CTRL_LED,
 	UKBD_N_TRANSFER,
 };
 
 struct ukbd_softc {
 	keyboard_t sc_kbd;
 	keymap_t sc_keymap;
 	accentmap_t sc_accmap;
 	fkeytab_t sc_fkeymap[UKBD_NFKEY];
 	struct hid_location sc_loc_apple_eject;
 	struct hid_location sc_loc_apple_fn;
 	struct hid_location sc_loc_ctrl_l;
 	struct hid_location sc_loc_ctrl_r;
 	struct hid_location sc_loc_shift_l;
 	struct hid_location sc_loc_shift_r;
 	struct hid_location sc_loc_alt_l;
 	struct hid_location sc_loc_alt_r;
 	struct hid_location sc_loc_win_l;
 	struct hid_location sc_loc_win_r;
 	struct hid_location sc_loc_events;
 	struct hid_location sc_loc_numlock;
 	struct hid_location sc_loc_capslock;
 	struct hid_location sc_loc_scrolllock;
 	struct usb_callout sc_callout;
 	struct ukbd_data sc_ndata;
 	struct ukbd_data sc_odata;
 
 	struct thread *sc_poll_thread;
 	struct usb_device *sc_udev;
 	struct usb_interface *sc_iface;
 	struct usb_xfer *sc_xfer[UKBD_N_TRANSFER];
 
 	uint32_t sc_ntime[UKBD_NKEYCODE];
 	uint32_t sc_otime[UKBD_NKEYCODE];
 	uint32_t sc_input[UKBD_IN_BUF_SIZE];	/* input buffer */
 	uint32_t sc_time_ms;
 	uint32_t sc_composed_char;	/* composed char code, if non-zero */
 #ifdef UKBD_EMULATE_ATSCANCODE
 	uint32_t sc_buffered_char[2];
 #endif
 	uint32_t sc_flags;		/* flags */
 #define	UKBD_FLAG_COMPOSE	0x00000001
 #define	UKBD_FLAG_POLLING	0x00000002
 #define	UKBD_FLAG_SET_LEDS	0x00000004
 #define	UKBD_FLAG_ATTACHED	0x00000010
 #define	UKBD_FLAG_GONE		0x00000020
 
 #define	UKBD_FLAG_HID_MASK	0x003fffc0
 #define	UKBD_FLAG_APPLE_EJECT	0x00000040
 #define	UKBD_FLAG_APPLE_FN	0x00000080
 #define	UKBD_FLAG_APPLE_SWAP	0x00000100
 #define	UKBD_FLAG_TIMER_RUNNING	0x00000200
 #define	UKBD_FLAG_CTRL_L	0x00000400
 #define	UKBD_FLAG_CTRL_R	0x00000800
 #define	UKBD_FLAG_SHIFT_L	0x00001000
 #define	UKBD_FLAG_SHIFT_R	0x00002000
 #define	UKBD_FLAG_ALT_L		0x00004000
 #define	UKBD_FLAG_ALT_R		0x00008000
 #define	UKBD_FLAG_WIN_L		0x00010000
 #define	UKBD_FLAG_WIN_R		0x00020000
 #define	UKBD_FLAG_EVENTS	0x00040000
 #define	UKBD_FLAG_NUMLOCK	0x00080000
 #define	UKBD_FLAG_CAPSLOCK	0x00100000
 #define	UKBD_FLAG_SCROLLLOCK 	0x00200000
 
 	int	sc_mode;		/* input mode (K_XLATE,K_RAW,K_CODE) */
 	int	sc_state;		/* shift/lock key state */
 	int	sc_accents;		/* accent key index (> 0) */
+	int	sc_polling;		/* polling recursion count */
 	int	sc_led_size;
 	int	sc_kbd_size;
 
 	uint16_t sc_inputs;
 	uint16_t sc_inputhead;
 	uint16_t sc_inputtail;
 	uint16_t sc_modifiers;
 
 	uint8_t	sc_leds;		/* store for async led requests */
 	uint8_t	sc_iface_index;
 	uint8_t	sc_iface_no;
 	uint8_t sc_id_apple_eject;
 	uint8_t sc_id_apple_fn;
 	uint8_t sc_id_ctrl_l;
 	uint8_t sc_id_ctrl_r;
 	uint8_t sc_id_shift_l;
 	uint8_t sc_id_shift_r;
 	uint8_t sc_id_alt_l;
 	uint8_t sc_id_alt_r;
 	uint8_t sc_id_win_l;
 	uint8_t sc_id_win_r;
 	uint8_t sc_id_event;
 	uint8_t sc_id_numlock;
 	uint8_t sc_id_capslock;
 	uint8_t sc_id_scrolllock;
 	uint8_t sc_id_events;
 	uint8_t sc_kbd_id;
 
 	uint8_t sc_buffer[UKBD_BUFFER_SIZE];
 };
 
 #define	KEY_ERROR	  0x01
 
 #define	KEY_PRESS	  0
 #define	KEY_RELEASE	  0x400
 #define	KEY_INDEX(c)	  ((c) & 0xFF)
 
 #define	SCAN_PRESS	  0
 #define	SCAN_RELEASE	  0x80
 #define	SCAN_PREFIX_E0	  0x100
 #define	SCAN_PREFIX_E1	  0x200
 #define	SCAN_PREFIX_CTL	  0x400
 #define	SCAN_PREFIX_SHIFT 0x800
 #define	SCAN_PREFIX	(SCAN_PREFIX_E0  | SCAN_PREFIX_E1 | \
 			 SCAN_PREFIX_CTL | SCAN_PREFIX_SHIFT)
 #define	SCAN_CHAR(c)	((c) & 0x7f)
 
 #define	UKBD_LOCK()	mtx_lock(&Giant)
 #define	UKBD_UNLOCK()	mtx_unlock(&Giant)
 
 #ifdef	INVARIANTS
 
 /*
  * Assert that the lock is held in all contexts
  * where the code can be executed.
  */
 #define	UKBD_LOCK_ASSERT()	mtx_assert(&Giant, MA_OWNED)
 
 /*
  * Assert that the lock is held in the contexts
  * where it really has to be so.
  */
 #define	UKBD_CTX_LOCK_ASSERT()			 	\
 	do {						\
 		if (!kdb_active && panicstr == NULL)	\
 			mtx_assert(&Giant, MA_OWNED);	\
 	} while (0)
 #else
 
 #define UKBD_LOCK_ASSERT()	(void)0
 #define UKBD_CTX_LOCK_ASSERT()	(void)0
 
 #endif
 
 struct ukbd_mods {
 	uint32_t mask, key;
 };
 
 static const struct ukbd_mods ukbd_mods[UKBD_NMOD] = {
 	{MOD_CONTROL_L, 0xe0},
 	{MOD_CONTROL_R, 0xe4},
 	{MOD_SHIFT_L, 0xe1},
 	{MOD_SHIFT_R, 0xe5},
 	{MOD_ALT_L, 0xe2},
 	{MOD_ALT_R, 0xe6},
 	{MOD_WIN_L, 0xe3},
 	{MOD_WIN_R, 0xe7},
 };
 
 #define	NN 0				/* no translation */
 /*
  * Translate USB keycodes to AT keyboard scancodes.
  */
 /*
  * FIXME: Mac USB keyboard generates:
  * 0x53: keypad NumLock/Clear
  * 0x66: Power
  * 0x67: keypad =
  * 0x68: F13
  * 0x69: F14
  * 0x6a: F15
  * 
  * USB Apple Keyboard JIS generates:
  * 0x90: Kana
  * 0x91: Eisu
  */
 static const uint8_t ukbd_trtab[256] = {
 	0, 0, 0, 0, 30, 48, 46, 32,	/* 00 - 07 */
 	18, 33, 34, 35, 23, 36, 37, 38,	/* 08 - 0F */
 	50, 49, 24, 25, 16, 19, 31, 20,	/* 10 - 17 */
 	22, 47, 17, 45, 21, 44, 2, 3,	/* 18 - 1F */
 	4, 5, 6, 7, 8, 9, 10, 11,	/* 20 - 27 */
 	28, 1, 14, 15, 57, 12, 13, 26,	/* 28 - 2F */
 	27, 43, 43, 39, 40, 41, 51, 52,	/* 30 - 37 */
 	53, 58, 59, 60, 61, 62, 63, 64,	/* 38 - 3F */
 	65, 66, 67, 68, 87, 88, 92, 70,	/* 40 - 47 */
 	104, 102, 94, 96, 103, 99, 101, 98,	/* 48 - 4F */
 	97, 100, 95, 69, 91, 55, 74, 78,/* 50 - 57 */
 	89, 79, 80, 81, 75, 76, 77, 71,	/* 58 - 5F */
 	72, 73, 82, 83, 86, 107, 122, NN,	/* 60 - 67 */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* 68 - 6F */
 	NN, NN, NN, NN, 115, 108, 111, 113,	/* 70 - 77 */
 	109, 110, 112, 118, 114, 116, 117, 119,	/* 78 - 7F */
 	121, 120, NN, NN, NN, NN, NN, 123,	/* 80 - 87 */
 	124, 125, 126, 127, 128, NN, NN, NN,	/* 88 - 8F */
 	129, 130, NN, NN, NN, NN, NN, NN,	/* 90 - 97 */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* 98 - 9F */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* A0 - A7 */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* A8 - AF */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* B0 - B7 */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* B8 - BF */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* C0 - C7 */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* C8 - CF */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* D0 - D7 */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* D8 - DF */
 	29, 42, 56, 105, 90, 54, 93, 106,	/* E0 - E7 */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* E8 - EF */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* F0 - F7 */
 	NN, NN, NN, NN, NN, NN, NN, NN,	/* F8 - FF */
 };
 
 static const uint8_t ukbd_boot_desc[] = {
 	0x05, 0x01, 0x09, 0x06, 0xa1,
 	0x01, 0x05, 0x07, 0x19, 0xe0,
 	0x29, 0xe7, 0x15, 0x00, 0x25,
 	0x01, 0x75, 0x01, 0x95, 0x08,
 	0x81, 0x02, 0x95, 0x01, 0x75,
 	0x08, 0x81, 0x01, 0x95, 0x03,
 	0x75, 0x01, 0x05, 0x08, 0x19,
 	0x01, 0x29, 0x03, 0x91, 0x02,
 	0x95, 0x05, 0x75, 0x01, 0x91,
 	0x01, 0x95, 0x06, 0x75, 0x08,
 	0x15, 0x00, 0x26, 0xff, 0x00,
 	0x05, 0x07, 0x19, 0x00, 0x2a,
 	0xff, 0x00, 0x81, 0x00, 0xc0
 };
 
 /* prototypes */
 static void	ukbd_timeout(void *);
 static void	ukbd_set_leds(struct ukbd_softc *, uint8_t);
 static int	ukbd_set_typematic(keyboard_t *, int);
 #ifdef UKBD_EMULATE_ATSCANCODE
 static int	ukbd_key2scan(struct ukbd_softc *, int, int, int);
 #endif
 static uint32_t	ukbd_read_char(keyboard_t *, int);
 static void	ukbd_clear_state(keyboard_t *);
 static int	ukbd_ioctl(keyboard_t *, u_long, caddr_t);
 static int	ukbd_enable(keyboard_t *);
 static int	ukbd_disable(keyboard_t *);
 static void	ukbd_interrupt(struct ukbd_softc *);
 static void	ukbd_event_keyinput(struct ukbd_softc *);
 
 static device_probe_t ukbd_probe;
 static device_attach_t ukbd_attach;
 static device_detach_t ukbd_detach;
 static device_resume_t ukbd_resume;
 
 static uint8_t
 ukbd_any_key_pressed(struct ukbd_softc *sc)
 {
 	uint8_t i;
 	uint8_t j;
 
 	for (j = i = 0; i < UKBD_NKEYCODE; i++)
 		j |= sc->sc_odata.keycode[i];
 
 	return (j ? 1 : 0);
 }
 
 static void
 ukbd_start_timer(struct ukbd_softc *sc)
 {
 	sc->sc_flags |= UKBD_FLAG_TIMER_RUNNING;
 	usb_callout_reset(&sc->sc_callout, hz / 40, &ukbd_timeout, sc);
 }
 
 static void
 ukbd_put_key(struct ukbd_softc *sc, uint32_t key)
 {
 
 	UKBD_CTX_LOCK_ASSERT();
 
 	DPRINTF("0x%02x (%d) %s\n", key, key,
 	    (key & KEY_RELEASE) ? "released" : "pressed");
 
 	if (sc->sc_inputs < UKBD_IN_BUF_SIZE) {
 		sc->sc_input[sc->sc_inputtail] = key;
 		++(sc->sc_inputs);
 		++(sc->sc_inputtail);
 		if (sc->sc_inputtail >= UKBD_IN_BUF_SIZE) {
 			sc->sc_inputtail = 0;
 		}
 	} else {
 		DPRINTF("input buffer is full\n");
 	}
 }
 
 static void
 ukbd_do_poll(struct ukbd_softc *sc, uint8_t wait)
 {
 
 	UKBD_CTX_LOCK_ASSERT();
 	KASSERT((sc->sc_flags & UKBD_FLAG_POLLING) != 0,
 	    ("ukbd_do_poll called when not polling\n"));
 	DPRINTFN(2, "polling\n");
 
 	if (!kdb_active && !SCHEDULER_STOPPED()) {
 		/*
 		 * In this context the kernel is polling for input,
 		 * but the USB subsystem works in normal interrupt-driven
 		 * mode, so we just wait on the USB threads to do the job.
 		 * Note that we currently hold the Giant, but it's also used
 		 * as the transfer mtx, so we must release it while waiting.
 		 */
 		while (sc->sc_inputs == 0) {
 			/*
 			 * Give USB threads a chance to run.  Note that
 			 * kern_yield performs DROP_GIANT + PICKUP_GIANT.
 			 */
 			kern_yield(PRI_UNCHANGED);
 			if (!wait)
 				break;
 		}
 		return;
 	}
 
 	while (sc->sc_inputs == 0) {
 
 		usbd_transfer_poll(sc->sc_xfer, UKBD_N_TRANSFER);
 
 		/* Delay-optimised support for repetition of keys */
 		if (ukbd_any_key_pressed(sc)) {
 			/* a key is pressed - need timekeeping */
 			DELAY(1000);
 
 			/* 1 millisecond has passed */
 			sc->sc_time_ms += 1;
 		}
 
 		ukbd_interrupt(sc);
 
 		if (!wait)
 			break;
 	}
 }
 
 static int32_t
 ukbd_get_key(struct ukbd_softc *sc, uint8_t wait)
 {
 	int32_t c;
 
 	UKBD_CTX_LOCK_ASSERT();
 	KASSERT((!kdb_active && !SCHEDULER_STOPPED())
 	    || (sc->sc_flags & UKBD_FLAG_POLLING) != 0,
 	    ("not polling in kdb or panic\n"));
 
 	if (sc->sc_inputs == 0 &&
 	    (sc->sc_flags & UKBD_FLAG_GONE) == 0) {
 		/* start transfer, if not already started */
 		usbd_transfer_start(sc->sc_xfer[UKBD_INTR_DT]);
 	}
 
 	if (sc->sc_flags & UKBD_FLAG_POLLING)
 		ukbd_do_poll(sc, wait);
 
 	if (sc->sc_inputs == 0) {
 		c = -1;
 	} else {
 		c = sc->sc_input[sc->sc_inputhead];
 		--(sc->sc_inputs);
 		++(sc->sc_inputhead);
 		if (sc->sc_inputhead >= UKBD_IN_BUF_SIZE) {
 			sc->sc_inputhead = 0;
 		}
 	}
 	return (c);
 }
 
 static void
 ukbd_interrupt(struct ukbd_softc *sc)
 {
 	uint32_t n_mod;
 	uint32_t o_mod;
 	uint32_t now = sc->sc_time_ms;
 	uint32_t dtime;
 	uint8_t key;
 	uint8_t i;
 	uint8_t j;
 
 	UKBD_CTX_LOCK_ASSERT();
 
 	if (sc->sc_ndata.keycode[0] == KEY_ERROR)
 		return;
 
 	n_mod = sc->sc_ndata.modifiers;
 	o_mod = sc->sc_odata.modifiers;
 	if (n_mod != o_mod) {
 		for (i = 0; i < UKBD_NMOD; i++) {
 			if ((n_mod & ukbd_mods[i].mask) !=
 			    (o_mod & ukbd_mods[i].mask)) {
 				ukbd_put_key(sc, ukbd_mods[i].key |
 				    ((n_mod & ukbd_mods[i].mask) ?
 				    KEY_PRESS : KEY_RELEASE));
 			}
 		}
 	}
 	/* Check for released keys. */
 	for (i = 0; i < UKBD_NKEYCODE; i++) {
 		key = sc->sc_odata.keycode[i];
 		if (key == 0) {
 			continue;
 		}
 		for (j = 0; j < UKBD_NKEYCODE; j++) {
 			if (sc->sc_ndata.keycode[j] == 0) {
 				continue;
 			}
 			if (key == sc->sc_ndata.keycode[j]) {
 				goto rfound;
 			}
 		}
 		ukbd_put_key(sc, key | KEY_RELEASE);
 rfound:	;
 	}
 
 	/* Check for pressed keys. */
 	for (i = 0; i < UKBD_NKEYCODE; i++) {
 		key = sc->sc_ndata.keycode[i];
 		if (key == 0) {
 			continue;
 		}
 		sc->sc_ntime[i] = now + sc->sc_kbd.kb_delay1;
 		for (j = 0; j < UKBD_NKEYCODE; j++) {
 			if (sc->sc_odata.keycode[j] == 0) {
 				continue;
 			}
 			if (key == sc->sc_odata.keycode[j]) {
 
 				/* key is still pressed */
 
 				sc->sc_ntime[i] = sc->sc_otime[j];
 				dtime = (sc->sc_otime[j] - now);
 
 				if (!(dtime & 0x80000000)) {
 					/* time has not elapsed */
 					goto pfound;
 				}
 				sc->sc_ntime[i] = now + sc->sc_kbd.kb_delay2;
 				break;
 			}
 		}
 		ukbd_put_key(sc, key | KEY_PRESS);
 
 		/*
                  * If any other key is presently down, force its repeat to be
                  * well in the future (100s).  This makes the last key to be
                  * pressed do the autorepeat.
                  */
 		for (j = 0; j != UKBD_NKEYCODE; j++) {
 			if (j != i)
 				sc->sc_ntime[j] = now + (100 * 1000);
 		}
 pfound:	;
 	}
 
 	sc->sc_odata = sc->sc_ndata;
 
 	memcpy(sc->sc_otime, sc->sc_ntime, sizeof(sc->sc_otime));
 
 	ukbd_event_keyinput(sc);
 }
 
 static void
 ukbd_event_keyinput(struct ukbd_softc *sc)
 {
 	int c;
 
 	UKBD_CTX_LOCK_ASSERT();
 
 	if ((sc->sc_flags & UKBD_FLAG_POLLING) != 0)
 		return;
 
 	if (sc->sc_inputs == 0)
 		return;
 
 	if (KBD_IS_ACTIVE(&sc->sc_kbd) &&
 	    KBD_IS_BUSY(&sc->sc_kbd)) {
 		/* let the callback function process the input */
 		(sc->sc_kbd.kb_callback.kc_func) (&sc->sc_kbd, KBDIO_KEYINPUT,
 		    sc->sc_kbd.kb_callback.kc_arg);
 	} else {
 		/* read and discard the input, no one is waiting for it */
 		do {
 			c = ukbd_read_char(&sc->sc_kbd, 0);
 		} while (c != NOKEY);
 	}
 }
 
 static void
 ukbd_timeout(void *arg)
 {
 	struct ukbd_softc *sc = arg;
 
 	UKBD_LOCK_ASSERT();
 
 	sc->sc_time_ms += 25;	/* milliseconds */
 
 	ukbd_interrupt(sc);
 
 	/* Make sure any leftover key events gets read out */
 	ukbd_event_keyinput(sc);
 
 	if (ukbd_any_key_pressed(sc) || (sc->sc_inputs != 0)) {
 		ukbd_start_timer(sc);
 	} else {
 		sc->sc_flags &= ~UKBD_FLAG_TIMER_RUNNING;
 	}
 }
 
 static uint8_t
 ukbd_apple_fn(uint8_t keycode) {
 	switch (keycode) {
 	case 0x28: return 0x49; /* RETURN -> INSERT */
 	case 0x2a: return 0x4c; /* BACKSPACE -> DEL */
 	case 0x50: return 0x4a; /* LEFT ARROW -> HOME */
 	case 0x4f: return 0x4d; /* RIGHT ARROW -> END */
 	case 0x52: return 0x4b; /* UP ARROW -> PGUP */
 	case 0x51: return 0x4e; /* DOWN ARROW -> PGDN */
 	default: return keycode;
 	}
 }
 
 static uint8_t
 ukbd_apple_swap(uint8_t keycode) {
 	switch (keycode) {
 	case 0x35: return 0x64;
 	case 0x64: return 0x35;
 	default: return keycode;
 	}
 }
 
 static void
 ukbd_intr_callback(struct usb_xfer *xfer, usb_error_t error)
 {
 	struct ukbd_softc *sc = usbd_xfer_softc(xfer);
 	struct usb_page_cache *pc;
 	uint8_t i;
 	uint8_t offset;
 	uint8_t id;
 	int len;
 
 	UKBD_LOCK_ASSERT();
 
 	usbd_xfer_status(xfer, &len, NULL, NULL, NULL);
 	pc = usbd_xfer_get_frame(xfer, 0);
 
 	switch (USB_GET_STATE(xfer)) {
 	case USB_ST_TRANSFERRED:
 		DPRINTF("actlen=%d bytes\n", len);
 
 		if (len == 0) {
 			DPRINTF("zero length data\n");
 			goto tr_setup;
 		}
 
 		if (sc->sc_kbd_id != 0) {
 			/* check and remove HID ID byte */
 			usbd_copy_out(pc, 0, &id, 1);
 			offset = 1;
 			len--;
 			if (len == 0) {
 				DPRINTF("zero length data\n");
 				goto tr_setup;
 			}
 		} else {
 			offset = 0;
 			id = 0;
 		}
 
 		if (len > UKBD_BUFFER_SIZE)
 			len = UKBD_BUFFER_SIZE;
 
 		/* get data */
 		usbd_copy_out(pc, offset, sc->sc_buffer, len);
 
 		/* clear temporary storage */
 		memset(&sc->sc_ndata, 0, sizeof(sc->sc_ndata));
 
 		/* scan through HID data */
 		if ((sc->sc_flags & UKBD_FLAG_APPLE_EJECT) &&
 		    (id == sc->sc_id_apple_eject)) {
 			if (hid_get_data(sc->sc_buffer, len, &sc->sc_loc_apple_eject))
 				sc->sc_modifiers |= MOD_EJECT;
 			else
 				sc->sc_modifiers &= ~MOD_EJECT;
 		}
 		if ((sc->sc_flags & UKBD_FLAG_APPLE_FN) &&
 		    (id == sc->sc_id_apple_fn)) {
 			if (hid_get_data(sc->sc_buffer, len, &sc->sc_loc_apple_fn))
 				sc->sc_modifiers |= MOD_FN;
 			else
 				sc->sc_modifiers &= ~MOD_FN;
 		}
 		if ((sc->sc_flags & UKBD_FLAG_CTRL_L) &&
 		    (id == sc->sc_id_ctrl_l)) {
 			if (hid_get_data(sc->sc_buffer, len, &sc->sc_loc_ctrl_l))
 			  sc->	sc_modifiers |= MOD_CONTROL_L;
 			else
 			  sc->	sc_modifiers &= ~MOD_CONTROL_L;
 		}
 		if ((sc->sc_flags & UKBD_FLAG_CTRL_R) &&
 		    (id == sc->sc_id_ctrl_r)) {
 			if (hid_get_data(sc->sc_buffer, len, &sc->sc_loc_ctrl_r))
 				sc->sc_modifiers |= MOD_CONTROL_R;
 			else
 				sc->sc_modifiers &= ~MOD_CONTROL_R;
 		}
 		if ((sc->sc_flags & UKBD_FLAG_SHIFT_L) &&
 		    (id == sc->sc_id_shift_l)) {
 			if (hid_get_data(sc->sc_buffer, len, &sc->sc_loc_shift_l))
 				sc->sc_modifiers |= MOD_SHIFT_L;
 			else
 				sc->sc_modifiers &= ~MOD_SHIFT_L;
 		}
 		if ((sc->sc_flags & UKBD_FLAG_SHIFT_R) &&
 		    (id == sc->sc_id_shift_r)) {
 			if (hid_get_data(sc->sc_buffer, len, &sc->sc_loc_shift_r))
 				sc->sc_modifiers |= MOD_SHIFT_R;
 			else
 				sc->sc_modifiers &= ~MOD_SHIFT_R;
 		}
 		if ((sc->sc_flags & UKBD_FLAG_ALT_L) &&
 		    (id == sc->sc_id_alt_l)) {
 			if (hid_get_data(sc->sc_buffer, len, &sc->sc_loc_alt_l))
 				sc->sc_modifiers |= MOD_ALT_L;
 			else
 				sc->sc_modifiers &= ~MOD_ALT_L;
 		}
 		if ((sc->sc_flags & UKBD_FLAG_ALT_R) &&
 		    (id == sc->sc_id_alt_r)) {
 			if (hid_get_data(sc->sc_buffer, len, &sc->sc_loc_alt_r))
 				sc->sc_modifiers |= MOD_ALT_R;
 			else
 				sc->sc_modifiers &= ~MOD_ALT_R;
 		}
 		if ((sc->sc_flags & UKBD_FLAG_WIN_L) &&
 		    (id == sc->sc_id_win_l)) {
 			if (hid_get_data(sc->sc_buffer, len, &sc->sc_loc_win_l))
 				sc->sc_modifiers |= MOD_WIN_L;
 			else
 				sc->sc_modifiers &= ~MOD_WIN_L;
 		}
 		if ((sc->sc_flags & UKBD_FLAG_WIN_R) &&
 		    (id == sc->sc_id_win_r)) {
 			if (hid_get_data(sc->sc_buffer, len, &sc->sc_loc_win_r))
 				sc->sc_modifiers |= MOD_WIN_R;
 			else
 				sc->sc_modifiers &= ~MOD_WIN_R;
 		}
 
 		sc->sc_ndata.modifiers = sc->sc_modifiers;
 
 		if ((sc->sc_flags & UKBD_FLAG_EVENTS) &&
 		    (id == sc->sc_id_events)) {
 			i = sc->sc_loc_events.count;
 			if (i > UKBD_NKEYCODE)
 				i = UKBD_NKEYCODE;
 			if (i > len)
 				i = len;
 			while (i--) {
 				sc->sc_ndata.keycode[i] =
 				    hid_get_data(sc->sc_buffer + i, len - i,
 				    &sc->sc_loc_events);
 			}
 		}
 
 #ifdef USB_DEBUG
 		DPRINTF("modifiers = 0x%04x\n", (int)sc->sc_modifiers);
 		for (i = 0; i < UKBD_NKEYCODE; i++) {
 			if (sc->sc_ndata.keycode[i]) {
 				DPRINTF("[%d] = 0x%02x\n",
 				    (int)i, (int)sc->sc_ndata.keycode[i]);
 			}
 		}
 #endif
 		if (sc->sc_modifiers & MOD_FN) {
 			for (i = 0; i < UKBD_NKEYCODE; i++) {
 				sc->sc_ndata.keycode[i] = 
 				    ukbd_apple_fn(sc->sc_ndata.keycode[i]);
 			}
 		}
 
 		if (sc->sc_flags & UKBD_FLAG_APPLE_SWAP) {
 			for (i = 0; i < UKBD_NKEYCODE; i++) {
 				sc->sc_ndata.keycode[i] = 
 				    ukbd_apple_swap(sc->sc_ndata.keycode[i]);
 			}
 		}
 
 		ukbd_interrupt(sc);
 
 		if (!(sc->sc_flags & UKBD_FLAG_TIMER_RUNNING)) {
 			if (ukbd_any_key_pressed(sc)) {
 				ukbd_start_timer(sc);
 			}
 		}
 
 	case USB_ST_SETUP:
 tr_setup:
 		if (sc->sc_inputs < UKBD_IN_BUF_FULL) {
 			usbd_xfer_set_frame_len(xfer, 0, usbd_xfer_max_len(xfer));
 			usbd_transfer_submit(xfer);
 		} else {
 			DPRINTF("input queue is full!\n");
 		}
 		break;
 
 	default:			/* Error */
 		DPRINTF("error=%s\n", usbd_errstr(error));
 
 		if (error != USB_ERR_CANCELLED) {
 			/* try to clear stall first */
 			usbd_xfer_set_stall(xfer);
 			goto tr_setup;
 		}
 		break;
 	}
 }
 
 static void
 ukbd_set_leds_callback(struct usb_xfer *xfer, usb_error_t error)
 {
 	struct ukbd_softc *sc = usbd_xfer_softc(xfer);
 	struct usb_device_request req;
 	struct usb_page_cache *pc;
 	uint8_t id;
 	uint8_t any;
 	int len;
 
 	UKBD_LOCK_ASSERT();
 
 #ifdef USB_DEBUG
 	if (ukbd_no_leds)
 		return;
 #endif
 
 	switch (USB_GET_STATE(xfer)) {
 	case USB_ST_TRANSFERRED:
 	case USB_ST_SETUP:
 		if (!(sc->sc_flags & UKBD_FLAG_SET_LEDS))
 			break;
 		sc->sc_flags &= ~UKBD_FLAG_SET_LEDS;
 
 		req.bmRequestType = UT_WRITE_CLASS_INTERFACE;
 		req.bRequest = UR_SET_REPORT;
 		USETW2(req.wValue, UHID_OUTPUT_REPORT, 0);
 		req.wIndex[0] = sc->sc_iface_no;
 		req.wIndex[1] = 0;
 		req.wLength[1] = 0;
 
 		memset(sc->sc_buffer, 0, UKBD_BUFFER_SIZE);
 
 		id = 0;
 		any = 0;
 
 		/* Assumption: All led bits must be in the same ID. */
 
 		if (sc->sc_flags & UKBD_FLAG_NUMLOCK) {
 			if (sc->sc_leds & NLKED) {
 				hid_put_data_unsigned(sc->sc_buffer + 1, UKBD_BUFFER_SIZE - 1,
 				    &sc->sc_loc_numlock, 1);
 			}
 			id = sc->sc_id_numlock;
 			any = 1;
 		}
 
 		if (sc->sc_flags & UKBD_FLAG_SCROLLLOCK) {
 			if (sc->sc_leds & SLKED) {
 				hid_put_data_unsigned(sc->sc_buffer + 1, UKBD_BUFFER_SIZE - 1,
 				    &sc->sc_loc_scrolllock, 1);
 			}
 			id = sc->sc_id_scrolllock;
 			any = 1;
 		}
 
 		if (sc->sc_flags & UKBD_FLAG_CAPSLOCK) {
 			if (sc->sc_leds & CLKED) {
 				hid_put_data_unsigned(sc->sc_buffer + 1, UKBD_BUFFER_SIZE - 1,
 				    &sc->sc_loc_capslock, 1);
 			}
 			id = sc->sc_id_capslock;
 			any = 1;
 		}
 
 		/* if no leds, nothing to do */
 		if (!any)
 			break;
 
 		/* range check output report length */
 		len = sc->sc_led_size;
 		if (len > (UKBD_BUFFER_SIZE - 1))
 			len = (UKBD_BUFFER_SIZE - 1);
 
 		/* check if we need to prefix an ID byte */
 		sc->sc_buffer[0] = id;
 
 		pc = usbd_xfer_get_frame(xfer, 1);
 		if (id != 0) {
 			len++;
 			usbd_copy_in(pc, 0, sc->sc_buffer, len);
 		} else {
 			usbd_copy_in(pc, 0, sc->sc_buffer + 1, len);
 		}
 		req.wLength[0] = len;
 		usbd_xfer_set_frame_len(xfer, 1, len);
 
 		DPRINTF("len=%d, id=%d\n", len, id);
 
 		/* setup control request last */
 		pc = usbd_xfer_get_frame(xfer, 0);
 		usbd_copy_in(pc, 0, &req, sizeof(req));
 		usbd_xfer_set_frame_len(xfer, 0, sizeof(req));
 
 		/* start data transfer */
 		usbd_xfer_set_frames(xfer, 2);
 		usbd_transfer_submit(xfer);
 		break;
 
 	default:			/* Error */
 		DPRINTFN(1, "error=%s\n", usbd_errstr(error));
 		break;
 	}
 }
 
 static const struct usb_config ukbd_config[UKBD_N_TRANSFER] = {
 
 	[UKBD_INTR_DT] = {
 		.type = UE_INTERRUPT,
 		.endpoint = UE_ADDR_ANY,
 		.direction = UE_DIR_IN,
 		.flags = {.pipe_bof = 1,.short_xfer_ok = 1,},
 		.bufsize = 0,	/* use wMaxPacketSize */
 		.callback = &ukbd_intr_callback,
 	},
 
 	[UKBD_CTRL_LED] = {
 		.type = UE_CONTROL,
 		.endpoint = 0x00,	/* Control pipe */
 		.direction = UE_DIR_ANY,
 		.bufsize = sizeof(struct usb_device_request) + UKBD_BUFFER_SIZE,
 		.callback = &ukbd_set_leds_callback,
 		.timeout = 1000,	/* 1 second */
 	},
 };
 
 /* A match on these entries will load ukbd */
 static const STRUCT_USB_HOST_ID __used ukbd_devs[] = {
 	{USB_IFACE_CLASS(UICLASS_HID),
 	 USB_IFACE_SUBCLASS(UISUBCLASS_BOOT),
 	 USB_IFACE_PROTOCOL(UIPROTO_BOOT_KEYBOARD),},
 };
 
 static int
 ukbd_probe(device_t dev)
 {
 	keyboard_switch_t *sw = kbd_get_switch(UKBD_DRIVER_NAME);
 	struct usb_attach_arg *uaa = device_get_ivars(dev);
 	void *d_ptr;
 	int error;
 	uint16_t d_len;
 
 	UKBD_LOCK_ASSERT();
 	DPRINTFN(11, "\n");
 
 	if (sw == NULL) {
 		return (ENXIO);
 	}
 	if (uaa->usb_mode != USB_MODE_HOST) {
 		return (ENXIO);
 	}
 
 	if (uaa->info.bInterfaceClass != UICLASS_HID)
 		return (ENXIO);
 
 	if (usb_test_quirk(uaa, UQ_KBD_IGNORE))
 		return (ENXIO);
 
 	if ((uaa->info.bInterfaceSubClass == UISUBCLASS_BOOT) &&
 	    (uaa->info.bInterfaceProtocol == UIPROTO_BOOT_KEYBOARD))
 		return (BUS_PROBE_DEFAULT);
 
 	error = usbd_req_get_hid_desc(uaa->device, NULL,
 	    &d_ptr, &d_len, M_TEMP, uaa->info.bIfaceIndex);
 
 	if (error)
 		return (ENXIO);
 
 	if (hid_is_keyboard(d_ptr, d_len)) {
 		if (hid_is_mouse(d_ptr, d_len)) {
 			/*
 			 * NOTE: We currently don't support USB mouse
 			 * and USB keyboard on the same USB endpoint.
 			 * Let "ums" driver win.
 			 */
 			error = ENXIO;
 		} else {
 			error = BUS_PROBE_DEFAULT;
 		}
 	} else {
 		error = ENXIO;
 	}
 	free(d_ptr, M_TEMP);
 	return (error);
 }
 
 static void
 ukbd_parse_hid(struct ukbd_softc *sc, const uint8_t *ptr, uint32_t len)
 {
 	uint32_t flags;
 
 	/* reset detected bits */
 	sc->sc_flags &= ~UKBD_FLAG_HID_MASK;
 
 	/* check if there is an ID byte */
 	sc->sc_kbd_size = hid_report_size(ptr, len,
 	    hid_input, &sc->sc_kbd_id);
 
 	/* investigate if this is an Apple Keyboard */
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_CONSUMER, HUG_APPLE_EJECT),
 	    hid_input, 0, &sc->sc_loc_apple_eject, &flags,
 	    &sc->sc_id_apple_eject)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_APPLE_EJECT | 
 			    UKBD_FLAG_APPLE_SWAP;
 		DPRINTFN(1, "Found Apple eject-key\n");
 	}
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(0xFFFF, 0x0003),
 	    hid_input, 0, &sc->sc_loc_apple_fn, &flags,
 	    &sc->sc_id_apple_fn)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_APPLE_FN;
 		DPRINTFN(1, "Found Apple FN-key\n");
 	}
 	/* figure out some keys */
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_KEYBOARD, 0xE0),
 	    hid_input, 0, &sc->sc_loc_ctrl_l, &flags,
 	    &sc->sc_id_ctrl_l)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_CTRL_L;
 		DPRINTFN(1, "Found left control\n");
 	}
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_KEYBOARD, 0xE4),
 	    hid_input, 0, &sc->sc_loc_ctrl_r, &flags,
 	    &sc->sc_id_ctrl_r)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_CTRL_R;
 		DPRINTFN(1, "Found right control\n");
 	}
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_KEYBOARD, 0xE1),
 	    hid_input, 0, &sc->sc_loc_shift_l, &flags,
 	    &sc->sc_id_shift_l)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_SHIFT_L;
 		DPRINTFN(1, "Found left shift\n");
 	}
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_KEYBOARD, 0xE5),
 	    hid_input, 0, &sc->sc_loc_shift_r, &flags,
 	    &sc->sc_id_shift_r)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_SHIFT_R;
 		DPRINTFN(1, "Found right shift\n");
 	}
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_KEYBOARD, 0xE2),
 	    hid_input, 0, &sc->sc_loc_alt_l, &flags,
 	    &sc->sc_id_alt_l)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_ALT_L;
 		DPRINTFN(1, "Found left alt\n");
 	}
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_KEYBOARD, 0xE6),
 	    hid_input, 0, &sc->sc_loc_alt_r, &flags,
 	    &sc->sc_id_alt_r)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_ALT_R;
 		DPRINTFN(1, "Found right alt\n");
 	}
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_KEYBOARD, 0xE3),
 	    hid_input, 0, &sc->sc_loc_win_l, &flags,
 	    &sc->sc_id_win_l)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_WIN_L;
 		DPRINTFN(1, "Found left GUI\n");
 	}
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_KEYBOARD, 0xE7),
 	    hid_input, 0, &sc->sc_loc_win_r, &flags,
 	    &sc->sc_id_win_r)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_WIN_R;
 		DPRINTFN(1, "Found right GUI\n");
 	}
 	/* figure out event buffer */
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_KEYBOARD, 0x00),
 	    hid_input, 0, &sc->sc_loc_events, &flags,
 	    &sc->sc_id_events)) {
 		if (flags & HIO_VARIABLE) {
 			DPRINTFN(1, "Ignoring keyboard event control\n");
 		} else {
 			sc->sc_flags |= UKBD_FLAG_EVENTS;
 			DPRINTFN(1, "Found keyboard event array\n");
 		}
 	}
 
 	/* figure out leds on keyboard */
 	sc->sc_led_size = hid_report_size(ptr, len,
 	    hid_output, NULL);
 
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_LEDS, 0x01),
 	    hid_output, 0, &sc->sc_loc_numlock, &flags,
 	    &sc->sc_id_numlock)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_NUMLOCK;
 		DPRINTFN(1, "Found keyboard numlock\n");
 	}
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_LEDS, 0x02),
 	    hid_output, 0, &sc->sc_loc_capslock, &flags,
 	    &sc->sc_id_capslock)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_CAPSLOCK;
 		DPRINTFN(1, "Found keyboard capslock\n");
 	}
 	if (hid_locate(ptr, len,
 	    HID_USAGE2(HUP_LEDS, 0x03),
 	    hid_output, 0, &sc->sc_loc_scrolllock, &flags,
 	    &sc->sc_id_scrolllock)) {
 		if (flags & HIO_VARIABLE)
 			sc->sc_flags |= UKBD_FLAG_SCROLLLOCK;
 		DPRINTFN(1, "Found keyboard scrolllock\n");
 	}
 }
 
 static int
 ukbd_attach(device_t dev)
 {
 	struct ukbd_softc *sc = device_get_softc(dev);
 	struct usb_attach_arg *uaa = device_get_ivars(dev);
 	int unit = device_get_unit(dev);
 	keyboard_t *kbd = &sc->sc_kbd;
 	void *hid_ptr = NULL;
 	usb_error_t err;
 	uint16_t n;
 	uint16_t hid_len;
 #ifdef USB_DEBUG
 	int rate;
 #endif
 	UKBD_LOCK_ASSERT();
 
 	kbd_init_struct(kbd, UKBD_DRIVER_NAME, KB_OTHER, unit, 0, 0, 0);
 
 	kbd->kb_data = (void *)sc;
 
 	device_set_usb_desc(dev);
 
 	sc->sc_udev = uaa->device;
 	sc->sc_iface = uaa->iface;
 	sc->sc_iface_index = uaa->info.bIfaceIndex;
 	sc->sc_iface_no = uaa->info.bIfaceNum;
 	sc->sc_mode = K_XLATE;
 
 	usb_callout_init_mtx(&sc->sc_callout, &Giant, 0);
 
 	err = usbd_transfer_setup(uaa->device,
 	    &uaa->info.bIfaceIndex, sc->sc_xfer, ukbd_config,
 	    UKBD_N_TRANSFER, sc, &Giant);
 
 	if (err) {
 		DPRINTF("error=%s\n", usbd_errstr(err));
 		goto detach;
 	}
 	/* setup default keyboard maps */
 
 	sc->sc_keymap = key_map;
 	sc->sc_accmap = accent_map;
 	for (n = 0; n < UKBD_NFKEY; n++) {
 		sc->sc_fkeymap[n] = fkey_tab[n];
 	}
 
 	kbd_set_maps(kbd, &sc->sc_keymap, &sc->sc_accmap,
 	    sc->sc_fkeymap, UKBD_NFKEY);
 
 	KBD_FOUND_DEVICE(kbd);
 
 	ukbd_clear_state(kbd);
 
 	/*
 	 * FIXME: set the initial value for lock keys in "sc_state"
 	 * according to the BIOS data?
 	 */
 	KBD_PROBE_DONE(kbd);
 
 	/* get HID descriptor */
 	err = usbd_req_get_hid_desc(uaa->device, NULL, &hid_ptr,
 	    &hid_len, M_TEMP, uaa->info.bIfaceIndex);
 
 	if (err == 0) {
 		DPRINTF("Parsing HID descriptor of %d bytes\n",
 		    (int)hid_len);
 
 		ukbd_parse_hid(sc, hid_ptr, hid_len);
 
 		free(hid_ptr, M_TEMP);
 	}
 
 	/* check if we should use the boot protocol */
 	if (usb_test_quirk(uaa, UQ_KBD_BOOTPROTO) ||
 	    (err != 0) || (!(sc->sc_flags & UKBD_FLAG_EVENTS))) {
 
 		DPRINTF("Forcing boot protocol\n");
 
 		err = usbd_req_set_protocol(sc->sc_udev, NULL, 
 			sc->sc_iface_index, 0);
 
 		if (err != 0) {
 			DPRINTF("Set protocol error=%s (ignored)\n",
 			    usbd_errstr(err));
 		}
 
 		ukbd_parse_hid(sc, ukbd_boot_desc, sizeof(ukbd_boot_desc));
 	}
 
 	/* ignore if SETIDLE fails, hence it is not crucial */
 	usbd_req_set_idle(sc->sc_udev, NULL, sc->sc_iface_index, 0, 0);
 
 	ukbd_ioctl(kbd, KDSETLED, (caddr_t)&sc->sc_state);
 
 	KBD_INIT_DONE(kbd);
 
 	if (kbd_register(kbd) < 0) {
 		goto detach;
 	}
 	KBD_CONFIG_DONE(kbd);
 
 	ukbd_enable(kbd);
 
 #ifdef KBD_INSTALL_CDEV
 	if (kbd_attach(kbd)) {
 		goto detach;
 	}
 #endif
 	sc->sc_flags |= UKBD_FLAG_ATTACHED;
 
 	if (bootverbose) {
 		genkbd_diag(kbd, bootverbose);
 	}
 
 #ifdef USB_DEBUG
 	/* check for polling rate override */
 	rate = ukbd_pollrate;
 	if (rate > 0) {
 		if (rate > 1000)
 			rate = 1;
 		else
 			rate = 1000 / rate;
 
 		/* set new polling interval in ms */
 		usbd_xfer_set_interval(sc->sc_xfer[UKBD_INTR_DT], rate);
 	}
 #endif
 	/* start the keyboard */
 	usbd_transfer_start(sc->sc_xfer[UKBD_INTR_DT]);
 
 	return (0);			/* success */
 
 detach:
 	ukbd_detach(dev);
 	return (ENXIO);			/* error */
 }
 
 static int
 ukbd_detach(device_t dev)
 {
 	struct ukbd_softc *sc = device_get_softc(dev);
 	int error;
 
 	UKBD_LOCK_ASSERT();
 
 	DPRINTF("\n");
 
 	sc->sc_flags |= UKBD_FLAG_GONE;
 
 	usb_callout_stop(&sc->sc_callout);
 
 	/* kill any stuck keys */
 	if (sc->sc_flags & UKBD_FLAG_ATTACHED) {
 		/* stop receiving events from the USB keyboard */
 		usbd_transfer_stop(sc->sc_xfer[UKBD_INTR_DT]);
 
 		/* release all leftover keys, if any */
 		memset(&sc->sc_ndata, 0, sizeof(sc->sc_ndata));
 
 		/* process releasing of all keys */
 		ukbd_interrupt(sc);
 	}
 
 	ukbd_disable(&sc->sc_kbd);
 
 #ifdef KBD_INSTALL_CDEV
 	if (sc->sc_flags & UKBD_FLAG_ATTACHED) {
 		error = kbd_detach(&sc->sc_kbd);
 		if (error) {
 			/* usb attach cannot return an error */
 			device_printf(dev, "WARNING: kbd_detach() "
 			    "returned non-zero! (ignored)\n");
 		}
 	}
 #endif
 	if (KBD_IS_CONFIGURED(&sc->sc_kbd)) {
 		error = kbd_unregister(&sc->sc_kbd);
 		if (error) {
 			/* usb attach cannot return an error */
 			device_printf(dev, "WARNING: kbd_unregister() "
 			    "returned non-zero! (ignored)\n");
 		}
 	}
 	sc->sc_kbd.kb_flags = 0;
 
 	usbd_transfer_unsetup(sc->sc_xfer, UKBD_N_TRANSFER);
 
 	usb_callout_drain(&sc->sc_callout);
 
 	DPRINTF("%s: disconnected\n",
 	    device_get_nameunit(dev));
 
 	return (0);
 }
 
 static int
 ukbd_resume(device_t dev)
 {
 	struct ukbd_softc *sc = device_get_softc(dev);
 
 	UKBD_LOCK_ASSERT();
 
 	ukbd_clear_state(&sc->sc_kbd);
 
 	return (0);
 }
 
 /* early keyboard probe, not supported */
 static int
 ukbd_configure(int flags)
 {
 	return (0);
 }
 
 /* detect a keyboard, not used */
 static int
 ukbd__probe(int unit, void *arg, int flags)
 {
 	return (ENXIO);
 }
 
 /* reset and initialize the device, not used */
 static int
 ukbd_init(int unit, keyboard_t **kbdp, void *arg, int flags)
 {
 	return (ENXIO);
 }
 
 /* test the interface to the device, not used */
 static int
 ukbd_test_if(keyboard_t *kbd)
 {
 	return (0);
 }
 
 /* finish using this keyboard, not used */
 static int
 ukbd_term(keyboard_t *kbd)
 {
 	return (ENXIO);
 }
 
 /* keyboard interrupt routine, not used */
 static int
 ukbd_intr(keyboard_t *kbd, void *arg)
 {
 	return (0);
 }
 
 /* lock the access to the keyboard, not used */
 static int
 ukbd_lock(keyboard_t *kbd, int lock)
 {
 	return (1);
 }
 
 /*
  * Enable the access to the device; until this function is called,
  * the client cannot read from the keyboard.
  */
 static int
 ukbd_enable(keyboard_t *kbd)
 {
 
 	UKBD_LOCK();
 	KBD_ACTIVATE(kbd);
 	UKBD_UNLOCK();
 
 	return (0);
 }
 
 /* disallow the access to the device */
 static int
 ukbd_disable(keyboard_t *kbd)
 {
 
 	UKBD_LOCK();
 	KBD_DEACTIVATE(kbd);
 	UKBD_UNLOCK();
 
 	return (0);
 }
 
 /* check if data is waiting */
 /* Currently unused. */
 static int
 ukbd_check(keyboard_t *kbd)
 {
 	struct ukbd_softc *sc = kbd->kb_data;
 
 	UKBD_CTX_LOCK_ASSERT();
 
 	if (!KBD_IS_ACTIVE(kbd))
 		return (0);
 
 	if (sc->sc_flags & UKBD_FLAG_POLLING)
 		ukbd_do_poll(sc, 0);
 
 #ifdef UKBD_EMULATE_ATSCANCODE
 	if (sc->sc_buffered_char[0]) {
 		return (1);
 	}
 #endif
 	if (sc->sc_inputs > 0) {
 		return (1);
 	}
 	return (0);
 }
 
 /* check if char is waiting */
 static int
 ukbd_check_char_locked(keyboard_t *kbd)
 {
 	struct ukbd_softc *sc = kbd->kb_data;
 
 	UKBD_CTX_LOCK_ASSERT();
 
 	if (!KBD_IS_ACTIVE(kbd))
 		return (0);
 
 	if ((sc->sc_composed_char > 0) &&
 	    (!(sc->sc_flags & UKBD_FLAG_COMPOSE))) {
 		return (1);
 	}
 	return (ukbd_check(kbd));
 }
 
 static int
 ukbd_check_char(keyboard_t *kbd)
 {
 	int result;
 
 	UKBD_LOCK();
 	result = ukbd_check_char_locked(kbd);
 	UKBD_UNLOCK();
 
 	return (result);
 }
 
 /* read one byte from the keyboard if it's allowed */
 /* Currently unused. */
 static int
 ukbd_read(keyboard_t *kbd, int wait)
 {
 	struct ukbd_softc *sc = kbd->kb_data;
 	int32_t usbcode;
 #ifdef UKBD_EMULATE_ATSCANCODE
 	uint32_t keycode;
 	uint32_t scancode;
 
 #endif
 
 	UKBD_CTX_LOCK_ASSERT();
 
 	if (!KBD_IS_ACTIVE(kbd))
 		return (-1);
 
 #ifdef UKBD_EMULATE_ATSCANCODE
 	if (sc->sc_buffered_char[0]) {
 		scancode = sc->sc_buffered_char[0];
 		if (scancode & SCAN_PREFIX) {
 			sc->sc_buffered_char[0] &= ~SCAN_PREFIX;
 			return ((scancode & SCAN_PREFIX_E0) ? 0xe0 : 0xe1);
 		}
 		sc->sc_buffered_char[0] = sc->sc_buffered_char[1];
 		sc->sc_buffered_char[1] = 0;
 		return (scancode);
 	}
 #endif					/* UKBD_EMULATE_ATSCANCODE */
 
 	/* XXX */
 	usbcode = ukbd_get_key(sc, (wait == FALSE) ? 0 : 1);
 	if (!KBD_IS_ACTIVE(kbd) || (usbcode == -1))
 		return (-1);
 
 	++(kbd->kb_count);
 
 #ifdef UKBD_EMULATE_ATSCANCODE
 	keycode = ukbd_trtab[KEY_INDEX(usbcode)];
 	if (keycode == NN) {
 		return -1;
 	}
 	return (ukbd_key2scan(sc, keycode, sc->sc_ndata.modifiers,
 	    (usbcode & KEY_RELEASE)));
 #else					/* !UKBD_EMULATE_ATSCANCODE */
 	return (usbcode);
 #endif					/* UKBD_EMULATE_ATSCANCODE */
 }
 
 /* read char from the keyboard */
 static uint32_t
 ukbd_read_char_locked(keyboard_t *kbd, int wait)
 {
 	struct ukbd_softc *sc = kbd->kb_data;
 	uint32_t action;
 	uint32_t keycode;
 	int32_t usbcode;
 #ifdef UKBD_EMULATE_ATSCANCODE
 	uint32_t scancode;
 #endif
 
 	UKBD_CTX_LOCK_ASSERT();
 
 	if (!KBD_IS_ACTIVE(kbd))
 		return (NOKEY);
 
 next_code:
 
 	/* do we have a composed char to return ? */
 
 	if ((sc->sc_composed_char > 0) &&
 	    (!(sc->sc_flags & UKBD_FLAG_COMPOSE))) {
 
 		action = sc->sc_composed_char;
 		sc->sc_composed_char = 0;
 
 		if (action > 0xFF) {
 			goto errkey;
 		}
 		goto done;
 	}
 #ifdef UKBD_EMULATE_ATSCANCODE
 
 	/* do we have a pending raw scan code? */
 
 	if (sc->sc_mode == K_RAW) {
 		scancode = sc->sc_buffered_char[0];
 		if (scancode) {
 			if (scancode & SCAN_PREFIX) {
 				sc->sc_buffered_char[0] = (scancode & ~SCAN_PREFIX);
 				return ((scancode & SCAN_PREFIX_E0) ? 0xe0 : 0xe1);
 			}
 			sc->sc_buffered_char[0] = sc->sc_buffered_char[1];
 			sc->sc_buffered_char[1] = 0;
 			return (scancode);
 		}
 	}
 #endif					/* UKBD_EMULATE_ATSCANCODE */
 
 	/* see if there is something in the keyboard port */
 	/* XXX */
 	usbcode = ukbd_get_key(sc, (wait == FALSE) ? 0 : 1);
 	if (usbcode == -1) {
 		return (NOKEY);
 	}
 	++kbd->kb_count;
 
 #ifdef UKBD_EMULATE_ATSCANCODE
 	/* USB key index -> key code -> AT scan code */
 	keycode = ukbd_trtab[KEY_INDEX(usbcode)];
 	if (keycode == NN) {
 		return (NOKEY);
 	}
 	/* return an AT scan code for the K_RAW mode */
 	if (sc->sc_mode == K_RAW) {
 		return (ukbd_key2scan(sc, keycode, sc->sc_ndata.modifiers,
 		    (usbcode & KEY_RELEASE)));
 	}
 #else					/* !UKBD_EMULATE_ATSCANCODE */
 
 	/* return the byte as is for the K_RAW mode */
 	if (sc->sc_mode == K_RAW) {
 		return (usbcode);
 	}
 	/* USB key index -> key code */
 	keycode = ukbd_trtab[KEY_INDEX(usbcode)];
 	if (keycode == NN) {
 		return (NOKEY);
 	}
 #endif					/* UKBD_EMULATE_ATSCANCODE */
 
 	switch (keycode) {
 	case 0x38:			/* left alt (compose key) */
 		if (usbcode & KEY_RELEASE) {
 			if (sc->sc_flags & UKBD_FLAG_COMPOSE) {
 				sc->sc_flags &= ~UKBD_FLAG_COMPOSE;
 
 				if (sc->sc_composed_char > 0xFF) {
 					sc->sc_composed_char = 0;
 				}
 			}
 		} else {
 			if (!(sc->sc_flags & UKBD_FLAG_COMPOSE)) {
 				sc->sc_flags |= UKBD_FLAG_COMPOSE;
 				sc->sc_composed_char = 0;
 			}
 		}
 		break;
 		/* XXX: I don't like these... */
 	case 0x5c:			/* print screen */
 		if (sc->sc_flags & ALTS) {
 			keycode = 0x54;	/* sysrq */
 		}
 		break;
 	case 0x68:			/* pause/break */
 		if (sc->sc_flags & CTLS) {
 			keycode = 0x6c;	/* break */
 		}
 		break;
 	}
 
 	/* return the key code in the K_CODE mode */
 	if (usbcode & KEY_RELEASE) {
 		keycode |= SCAN_RELEASE;
 	}
 	if (sc->sc_mode == K_CODE) {
 		return (keycode);
 	}
 	/* compose a character code */
 	if (sc->sc_flags & UKBD_FLAG_COMPOSE) {
 		switch (keycode) {
 			/* key pressed, process it */
 		case 0x47:
 		case 0x48:
 		case 0x49:		/* keypad 7,8,9 */
 			sc->sc_composed_char *= 10;
 			sc->sc_composed_char += keycode - 0x40;
 			goto check_composed;
 
 		case 0x4B:
 		case 0x4C:
 		case 0x4D:		/* keypad 4,5,6 */
 			sc->sc_composed_char *= 10;
 			sc->sc_composed_char += keycode - 0x47;
 			goto check_composed;
 
 		case 0x4F:
 		case 0x50:
 		case 0x51:		/* keypad 1,2,3 */
 			sc->sc_composed_char *= 10;
 			sc->sc_composed_char += keycode - 0x4E;
 			goto check_composed;
 
 		case 0x52:		/* keypad 0 */
 			sc->sc_composed_char *= 10;
 			goto check_composed;
 
 			/* key released, no interest here */
 		case SCAN_RELEASE | 0x47:
 		case SCAN_RELEASE | 0x48:
 		case SCAN_RELEASE | 0x49:	/* keypad 7,8,9 */
 		case SCAN_RELEASE | 0x4B:
 		case SCAN_RELEASE | 0x4C:
 		case SCAN_RELEASE | 0x4D:	/* keypad 4,5,6 */
 		case SCAN_RELEASE | 0x4F:
 		case SCAN_RELEASE | 0x50:
 		case SCAN_RELEASE | 0x51:	/* keypad 1,2,3 */
 		case SCAN_RELEASE | 0x52:	/* keypad 0 */
 			goto next_code;
 
 		case 0x38:		/* left alt key */
 			break;
 
 		default:
 			if (sc->sc_composed_char > 0) {
 				sc->sc_flags &= ~UKBD_FLAG_COMPOSE;
 				sc->sc_composed_char = 0;
 				goto errkey;
 			}
 			break;
 		}
 	}
 	/* keycode to key action */
 	action = genkbd_keyaction(kbd, SCAN_CHAR(keycode),
 	    (keycode & SCAN_RELEASE),
 	    &sc->sc_state, &sc->sc_accents);
 	if (action == NOKEY) {
 		goto next_code;
 	}
 done:
 	return (action);
 
 check_composed:
 	if (sc->sc_composed_char <= 0xFF) {
 		goto next_code;
 	}
 errkey:
 	return (ERRKEY);
 }
 
 /* Currently wait is always false. */
 static uint32_t
 ukbd_read_char(keyboard_t *kbd, int wait)
 {
 	uint32_t keycode;
 
 	UKBD_LOCK();
 	keycode = ukbd_read_char_locked(kbd, wait);
 	UKBD_UNLOCK();
 
 	return (keycode);
 }
 
 /* some useful control functions */
 static int
 ukbd_ioctl_locked(keyboard_t *kbd, u_long cmd, caddr_t arg)
 {
 	struct ukbd_softc *sc = kbd->kb_data;
 	int i;
 #if defined(COMPAT_FREEBSD6) || defined(COMPAT_FREEBSD5) || \
     defined(COMPAT_FREEBSD4) || defined(COMPAT_43)
 	int ival;
 
 #endif
 
 	UKBD_LOCK_ASSERT();
 
 	switch (cmd) {
 	case KDGKBMODE:		/* get keyboard mode */
 		*(int *)arg = sc->sc_mode;
 		break;
 #if defined(COMPAT_FREEBSD6) || defined(COMPAT_FREEBSD5) || \
     defined(COMPAT_FREEBSD4) || defined(COMPAT_43)
 	case _IO('K', 7):
 		ival = IOCPARM_IVAL(arg);
 		arg = (caddr_t)&ival;
 		/* FALLTHROUGH */
 #endif
 	case KDSKBMODE:		/* set keyboard mode */
 		switch (*(int *)arg) {
 		case K_XLATE:
 			if (sc->sc_mode != K_XLATE) {
 				/* make lock key state and LED state match */
 				sc->sc_state &= ~LOCK_MASK;
 				sc->sc_state |= KBD_LED_VAL(kbd);
 			}
 			/* FALLTHROUGH */
 		case K_RAW:
 		case K_CODE:
 			if (sc->sc_mode != *(int *)arg) {
 				if ((sc->sc_flags & UKBD_FLAG_POLLING) == 0)
 					ukbd_clear_state(kbd);
 				sc->sc_mode = *(int *)arg;
 			}
 			break;
 		default:
 			return (EINVAL);
 		}
 		break;
 
 	case KDGETLED:			/* get keyboard LED */
 		*(int *)arg = KBD_LED_VAL(kbd);
 		break;
 #if defined(COMPAT_FREEBSD6) || defined(COMPAT_FREEBSD5) || \
     defined(COMPAT_FREEBSD4) || defined(COMPAT_43)
 	case _IO('K', 66):
 		ival = IOCPARM_IVAL(arg);
 		arg = (caddr_t)&ival;
 		/* FALLTHROUGH */
 #endif
 	case KDSETLED:			/* set keyboard LED */
 		/* NOTE: lock key state in "sc_state" won't be changed */
 		if (*(int *)arg & ~LOCK_MASK)
 			return (EINVAL);
 
 		i = *(int *)arg;
 
 		/* replace CAPS LED with ALTGR LED for ALTGR keyboards */
 		if (sc->sc_mode == K_XLATE &&
 		    kbd->kb_keymap->n_keys > ALTGR_OFFSET) {
 			if (i & ALKED)
 				i |= CLKED;
 			else
 				i &= ~CLKED;
 		}
 		if (KBD_HAS_DEVICE(kbd))
 			ukbd_set_leds(sc, i);
 
 		KBD_LED_VAL(kbd) = *(int *)arg;
 		break;
 	case KDGKBSTATE:		/* get lock key state */
 		*(int *)arg = sc->sc_state & LOCK_MASK;
 		break;
 #if defined(COMPAT_FREEBSD6) || defined(COMPAT_FREEBSD5) || \
     defined(COMPAT_FREEBSD4) || defined(COMPAT_43)
 	case _IO('K', 20):
 		ival = IOCPARM_IVAL(arg);
 		arg = (caddr_t)&ival;
 		/* FALLTHROUGH */
 #endif
 	case KDSKBSTATE:		/* set lock key state */
 		if (*(int *)arg & ~LOCK_MASK) {
 			return (EINVAL);
 		}
 		sc->sc_state &= ~LOCK_MASK;
 		sc->sc_state |= *(int *)arg;
 
 		/* set LEDs and quit */
 		return (ukbd_ioctl(kbd, KDSETLED, arg));
 
 	case KDSETREPEAT:		/* set keyboard repeat rate (new
 					 * interface) */
 		if (!KBD_HAS_DEVICE(kbd)) {
 			return (0);
 		}
 		if (((int *)arg)[1] < 0) {
 			return (EINVAL);
 		}
 		if (((int *)arg)[0] < 0) {
 			return (EINVAL);
 		}
 		if (((int *)arg)[0] < 200)	/* fastest possible value */
 			kbd->kb_delay1 = 200;
 		else
 			kbd->kb_delay1 = ((int *)arg)[0];
 		kbd->kb_delay2 = ((int *)arg)[1];
 		return (0);
 
 #if defined(COMPAT_FREEBSD6) || defined(COMPAT_FREEBSD5) || \
     defined(COMPAT_FREEBSD4) || defined(COMPAT_43)
 	case _IO('K', 67):
 		ival = IOCPARM_IVAL(arg);
 		arg = (caddr_t)&ival;
 		/* FALLTHROUGH */
 #endif
 	case KDSETRAD:			/* set keyboard repeat rate (old
 					 * interface) */
 		return (ukbd_set_typematic(kbd, *(int *)arg));
 
 	case PIO_KEYMAP:		/* set keyboard translation table */
 	case OPIO_KEYMAP:		/* set keyboard translation table
 					 * (compat) */
 	case PIO_KEYMAPENT:		/* set keyboard translation table
 					 * entry */
 	case PIO_DEADKEYMAP:		/* set accent key translation table */
 		sc->sc_accents = 0;
 		/* FALLTHROUGH */
 	default:
 		return (genkbd_commonioctl(kbd, cmd, arg));
 	}
 
 	return (0);
 }
 
 static int
 ukbd_ioctl(keyboard_t *kbd, u_long cmd, caddr_t arg)
 {
 	int result;
 
 	/*
 	 * XXX Check if someone is calling us from a critical section:
 	 */
 	if (curthread->td_critnest != 0)
 		return (EDEADLK);
 
 	/*
 	 * XXX KDGKBSTATE, KDSKBSTATE and KDSETLED can be called from any
 	 * context where printf(9) can be called, which among other things
 	 * includes interrupt filters and threads with any kinds of locks
 	 * already held.  For this reason it would be dangerous to acquire
 	 * the Giant here unconditionally.  On the other hand we have to
 	 * have it to handle the ioctl.
 	 * So we make our best effort to auto-detect whether we can grab
 	 * the Giant or not.  Blame syscons(4) for this.
 	 */
 	switch (cmd) {
 	case KDGKBSTATE:
 	case KDSKBSTATE:
 	case KDSETLED:
 		if (!mtx_owned(&Giant) && !SCHEDULER_STOPPED())
 			return (EDEADLK);	/* best I could come up with */
 		/* FALLTHROUGH */
 	default:
 		UKBD_LOCK();
 		result = ukbd_ioctl_locked(kbd, cmd, arg);
 		UKBD_UNLOCK();
 		return (result);
 	}
 }
 
 
 /* clear the internal state of the keyboard */
 static void
 ukbd_clear_state(keyboard_t *kbd)
 {
 	struct ukbd_softc *sc = kbd->kb_data;
 
 	UKBD_CTX_LOCK_ASSERT();
 
 	sc->sc_flags &= ~(UKBD_FLAG_COMPOSE | UKBD_FLAG_POLLING);
 	sc->sc_state &= LOCK_MASK;	/* preserve locking key state */
 	sc->sc_accents = 0;
 	sc->sc_composed_char = 0;
 #ifdef UKBD_EMULATE_ATSCANCODE
 	sc->sc_buffered_char[0] = 0;
 	sc->sc_buffered_char[1] = 0;
 #endif
 	memset(&sc->sc_ndata, 0, sizeof(sc->sc_ndata));
 	memset(&sc->sc_odata, 0, sizeof(sc->sc_odata));
 	memset(&sc->sc_ntime, 0, sizeof(sc->sc_ntime));
 	memset(&sc->sc_otime, 0, sizeof(sc->sc_otime));
 }
 
 /* save the internal state, not used */
 static int
 ukbd_get_state(keyboard_t *kbd, void *buf, size_t len)
 {
 	return (len == 0) ? 1 : -1;
 }
 
 /* set the internal state, not used */
 static int
 ukbd_set_state(keyboard_t *kbd, void *buf, size_t len)
 {
 	return (EINVAL);
 }
 
 static int
 ukbd_poll(keyboard_t *kbd, int on)
 {
 	struct ukbd_softc *sc = kbd->kb_data;
 
 	UKBD_LOCK();
-	if (on) {
+	/*
+	 * Keep a reference count on polling to allow recursive
+	 * cngrab() during a panic for example.
+	 */
+	if (on)
+		sc->sc_polling++;
+	else
+		sc->sc_polling--;
+
+	if (sc->sc_polling != 0) {
 		sc->sc_flags |= UKBD_FLAG_POLLING;
 		sc->sc_poll_thread = curthread;
 	} else {
 		sc->sc_flags &= ~UKBD_FLAG_POLLING;
 		ukbd_start_timer(sc);	/* start timer */
 	}
 	UKBD_UNLOCK();
 
 	return (0);
 }
 
 /* local functions */
 
 static void
 ukbd_set_leds(struct ukbd_softc *sc, uint8_t leds)
 {
 
 	UKBD_LOCK_ASSERT();
 	DPRINTF("leds=0x%02x\n", leds);
 
 	sc->sc_leds = leds;
 	sc->sc_flags |= UKBD_FLAG_SET_LEDS;
 
 	/* start transfer, if not already started */
 
 	usbd_transfer_start(sc->sc_xfer[UKBD_CTRL_LED]);
 }
 
 static int
 ukbd_set_typematic(keyboard_t *kbd, int code)
 {
 	static const int delays[] = {250, 500, 750, 1000};
 	static const int rates[] = {34, 38, 42, 46, 50, 55, 59, 63,
 		68, 76, 84, 92, 100, 110, 118, 126,
 		136, 152, 168, 184, 200, 220, 236, 252,
 	272, 304, 336, 368, 400, 440, 472, 504};
 
 	if (code & ~0x7f) {
 		return (EINVAL);
 	}
 	kbd->kb_delay1 = delays[(code >> 5) & 3];
 	kbd->kb_delay2 = rates[code & 0x1f];
 	return (0);
 }
 
 #ifdef UKBD_EMULATE_ATSCANCODE
 static int
 ukbd_key2scan(struct ukbd_softc *sc, int code, int shift, int up)
 {
 	static const int scan[] = {
 		/* 89 */
 		0x11c,	/* Enter */
 		/* 90-99 */
 		0x11d,	/* Ctrl-R */
 		0x135,	/* Divide */
 		0x137 | SCAN_PREFIX_SHIFT,	/* PrintScreen */
 		0x138,	/* Alt-R */
 		0x147,	/* Home */
 		0x148,	/* Up */
 		0x149,	/* PageUp */
 		0x14b,	/* Left */
 		0x14d,	/* Right */
 		0x14f,	/* End */
 		/* 100-109 */
 		0x150,	/* Down */
 		0x151,	/* PageDown */
 		0x152,	/* Insert */
 		0x153,	/* Delete */
 		0x146,	/* XXX Pause/Break */
 		0x15b,	/* Win_L(Super_L) */
 		0x15c,	/* Win_R(Super_R) */
 		0x15d,	/* Application(Menu) */
 
 		/* SUN TYPE 6 USB KEYBOARD */
 		0x168,	/* Sun Type 6 Help */
 		0x15e,	/* Sun Type 6 Stop */
 		/* 110 - 119 */
 		0x15f,	/* Sun Type 6 Again */
 		0x160,	/* Sun Type 6 Props */
 		0x161,	/* Sun Type 6 Undo */
 		0x162,	/* Sun Type 6 Front */
 		0x163,	/* Sun Type 6 Copy */
 		0x164,	/* Sun Type 6 Open */
 		0x165,	/* Sun Type 6 Paste */
 		0x166,	/* Sun Type 6 Find */
 		0x167,	/* Sun Type 6 Cut */
 		0x125,	/* Sun Type 6 Mute */
 		/* 120 - 130 */
 		0x11f,	/* Sun Type 6 VolumeDown */
 		0x11e,	/* Sun Type 6 VolumeUp */
 		0x120,	/* Sun Type 6 PowerDown */
 
 		/* Japanese 106/109 keyboard */
 		0x73,	/* Keyboard Intl' 1 (backslash / underscore) */
 		0x70,	/* Keyboard Intl' 2 (Katakana / Hiragana) */
 		0x7d,	/* Keyboard Intl' 3 (Yen sign) (Not using in jp106/109) */
 		0x79,	/* Keyboard Intl' 4 (Henkan) */
 		0x7b,	/* Keyboard Intl' 5 (Muhenkan) */
 		0x5c,	/* Keyboard Intl' 6 (Keypad ,) (For PC-9821 layout) */
 		0x71,   /* Apple Keyboard JIS (Kana) */
 		0x72,   /* Apple Keyboard JIS (Eisu) */
 	};
 
 	if ((code >= 89) && (code < (int)(89 + nitems(scan)))) {
 		code = scan[code - 89];
 	}
 	/* Pause/Break */
 	if ((code == 104) && (!(shift & (MOD_CONTROL_L | MOD_CONTROL_R)))) {
 		code = (0x45 | SCAN_PREFIX_E1 | SCAN_PREFIX_CTL);
 	}
 	if (shift & (MOD_SHIFT_L | MOD_SHIFT_R)) {
 		code &= ~SCAN_PREFIX_SHIFT;
 	}
 	code |= (up ? SCAN_RELEASE : SCAN_PRESS);
 
 	if (code & SCAN_PREFIX) {
 		if (code & SCAN_PREFIX_CTL) {
 			/* Ctrl */
 			sc->sc_buffered_char[0] = (0x1d | (code & SCAN_RELEASE));
 			sc->sc_buffered_char[1] = (code & ~SCAN_PREFIX);
 		} else if (code & SCAN_PREFIX_SHIFT) {
 			/* Shift */
 			sc->sc_buffered_char[0] = (0x2a | (code & SCAN_RELEASE));
 			sc->sc_buffered_char[1] = (code & ~SCAN_PREFIX_SHIFT);
 		} else {
 			sc->sc_buffered_char[0] = (code & ~SCAN_PREFIX);
 			sc->sc_buffered_char[1] = 0;
 		}
 		return ((code & SCAN_PREFIX_E0) ? 0xe0 : 0xe1);
 	}
 	return (code);
 
 }
 
 #endif					/* UKBD_EMULATE_ATSCANCODE */
 
 static keyboard_switch_t ukbdsw = {
 	.probe = &ukbd__probe,
 	.init = &ukbd_init,
 	.term = &ukbd_term,
 	.intr = &ukbd_intr,
 	.test_if = &ukbd_test_if,
 	.enable = &ukbd_enable,
 	.disable = &ukbd_disable,
 	.read = &ukbd_read,
 	.check = &ukbd_check,
 	.read_char = &ukbd_read_char,
 	.check_char = &ukbd_check_char,
 	.ioctl = &ukbd_ioctl,
 	.lock = &ukbd_lock,
 	.clear_state = &ukbd_clear_state,
 	.get_state = &ukbd_get_state,
 	.set_state = &ukbd_set_state,
 	.get_fkeystr = &genkbd_get_fkeystr,
 	.poll = &ukbd_poll,
 	.diag = &genkbd_diag,
 };
 
 KEYBOARD_DRIVER(ukbd, ukbdsw, ukbd_configure);
 
 static int
 ukbd_driver_load(module_t mod, int what, void *arg)
 {
 	switch (what) {
 	case MOD_LOAD:
 		kbd_add_driver(&ukbd_kbd_driver);
 		break;
 	case MOD_UNLOAD:
 		kbd_delete_driver(&ukbd_kbd_driver);
 		break;
 	}
 	return (0);
 }
 
 static devclass_t ukbd_devclass;
 
 static device_method_t ukbd_methods[] = {
 	DEVMETHOD(device_probe, ukbd_probe),
 	DEVMETHOD(device_attach, ukbd_attach),
 	DEVMETHOD(device_detach, ukbd_detach),
 	DEVMETHOD(device_resume, ukbd_resume),
 
 	DEVMETHOD_END
 };
 
 static driver_t ukbd_driver = {
 	.name = "ukbd",
 	.methods = ukbd_methods,
 	.size = sizeof(struct ukbd_softc),
 };
 
 DRIVER_MODULE(ukbd, uhub, ukbd_driver, ukbd_devclass, ukbd_driver_load, 0);
 MODULE_DEPEND(ukbd, usb, 1, 1, 1);
 MODULE_VERSION(ukbd, 1);
 USB_PNP_HOST_INFO(ukbd_devs);
Index: user/alc/PQ_LAUNDRY/sys/dev/xen/netfront/netfront.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/dev/xen/netfront/netfront.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/dev/xen/netfront/netfront.c	(revision 303775)
@@ -1,2349 +1,2352 @@
 /*-
  * Copyright (c) 2004-2006 Kip Macy
  * Copyright (c) 2015 Wei Liu <wei.liu2@citrix.com>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 
 #include <sys/param.h>
 #include <sys/sockio.h>
 #include <sys/limits.h>
 #include <sys/mbuf.h>
 #include <sys/malloc.h>
 #include <sys/module.h>
 #include <sys/kernel.h>
 #include <sys/socket.h>
 #include <sys/sysctl.h>
 #include <sys/taskqueue.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_arp.h>
 #include <net/ethernet.h>
 #include <net/if_media.h>
 #include <net/bpf.h>
 #include <net/if_types.h>
 
 #include <netinet/in.h>
 #include <netinet/ip.h>
 #include <netinet/if_ether.h>
 #include <netinet/tcp.h>
 #include <netinet/tcp_lro.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 
 #include <sys/bus.h>
 
 #include <xen/xen-os.h>
 #include <xen/hypervisor.h>
 #include <xen/xen_intr.h>
 #include <xen/gnttab.h>
 #include <xen/interface/memory.h>
 #include <xen/interface/io/netif.h>
 #include <xen/xenbus/xenbusvar.h>
 
 #include "xenbus_if.h"
 
 /* Features supported by all backends.  TSO and LRO can be negotiated */
 #define XN_CSUM_FEATURES	(CSUM_TCP | CSUM_UDP)
 
 #define NET_TX_RING_SIZE __RING_SIZE((netif_tx_sring_t *)0, PAGE_SIZE)
 #define NET_RX_RING_SIZE __RING_SIZE((netif_rx_sring_t *)0, PAGE_SIZE)
 
 #define NET_RX_SLOTS_MIN (XEN_NETIF_NR_SLOTS_MIN + 1)
 
 /*
  * Should the driver do LRO on the RX end
  *  this can be toggled on the fly, but the
  *  interface must be reset (down/up) for it
  *  to take effect.
  */
 static int xn_enable_lro = 1;
 TUNABLE_INT("hw.xn.enable_lro", &xn_enable_lro);
 
 /*
  * Number of pairs of queues.
  */
 static unsigned long xn_num_queues = 4;
 TUNABLE_ULONG("hw.xn.num_queues", &xn_num_queues);
 
 /**
  * \brief The maximum allowed data fragments in a single transmit
  *        request.
  *
  * This limit is imposed by the backend driver.  We assume here that
  * we are dealing with a Linux driver domain and have set our limit
  * to mirror the Linux MAX_SKB_FRAGS constant.
  */
 #define	MAX_TX_REQ_FRAGS (65536 / PAGE_SIZE + 2)
 
 #define RX_COPY_THRESHOLD 256
 
 #define net_ratelimit() 0
 
 struct netfront_rxq;
 struct netfront_txq;
 struct netfront_info;
 struct netfront_rx_info;
 
 static void xn_txeof(struct netfront_txq *);
 static void xn_rxeof(struct netfront_rxq *);
 static void xn_alloc_rx_buffers(struct netfront_rxq *);
 static void xn_alloc_rx_buffers_callout(void *arg);
 
 static void xn_release_rx_bufs(struct netfront_rxq *);
 static void xn_release_tx_bufs(struct netfront_txq *);
 
 static void xn_rxq_intr(struct netfront_rxq *);
 static void xn_txq_intr(struct netfront_txq *);
 static void xn_intr(void *);
 static inline int xn_count_frags(struct mbuf *m);
 static int xn_assemble_tx_request(struct netfront_txq *, struct mbuf *);
 static int xn_ioctl(struct ifnet *, u_long, caddr_t);
 static void xn_ifinit_locked(struct netfront_info *);
 static void xn_ifinit(void *);
 static void xn_stop(struct netfront_info *);
 static void xn_query_features(struct netfront_info *np);
 static int xn_configure_features(struct netfront_info *np);
 static void netif_free(struct netfront_info *info);
 static int netfront_detach(device_t dev);
 
 static int xn_txq_mq_start_locked(struct netfront_txq *, struct mbuf *);
 static int xn_txq_mq_start(struct ifnet *, struct mbuf *);
 
 static int talk_to_backend(device_t dev, struct netfront_info *info);
 static int create_netdev(device_t dev);
 static void netif_disconnect_backend(struct netfront_info *info);
 static int setup_device(device_t dev, struct netfront_info *info,
     unsigned long);
 static int xn_ifmedia_upd(struct ifnet *ifp);
 static void xn_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr);
 
 static int xn_connect(struct netfront_info *);
 static void xn_kick_rings(struct netfront_info *);
 
 static int xn_get_responses(struct netfront_rxq *,
     struct netfront_rx_info *, RING_IDX, RING_IDX *,
     struct mbuf **);
 
 #define virt_to_mfn(x) (vtophys(x) >> PAGE_SHIFT)
 
 #define INVALID_P2M_ENTRY (~0UL)
 
 struct xn_rx_stats
 {
 	u_long	rx_packets;	/* total packets received	*/
 	u_long	rx_bytes;	/* total bytes received 	*/
 	u_long	rx_errors;	/* bad packets received		*/
 };
 
 struct xn_tx_stats
 {
 	u_long	tx_packets;	/* total packets transmitted	*/
 	u_long	tx_bytes;	/* total bytes transmitted	*/
 	u_long	tx_errors;	/* packet transmit problems	*/
 };
 
 #define XN_QUEUE_NAME_LEN  8	/* xn{t,r}x_%u, allow for two digits */
 struct netfront_rxq {
 	struct netfront_info 	*info;
 	u_int			id;
 	char			name[XN_QUEUE_NAME_LEN];
 	struct mtx		lock;
 
 	int			ring_ref;
 	netif_rx_front_ring_t 	ring;
 	xen_intr_handle_t	xen_intr_handle;
 
 	grant_ref_t 		gref_head;
 	grant_ref_t 		grant_ref[NET_TX_RING_SIZE + 1];
 
 	struct mbuf		*mbufs[NET_RX_RING_SIZE + 1];
 
 	struct lro_ctrl		lro;
 
 	struct callout		rx_refill;
 
 	struct xn_rx_stats	stats;
 };
 
 struct netfront_txq {
 	struct netfront_info 	*info;
 	u_int 			id;
 	char			name[XN_QUEUE_NAME_LEN];
 	struct mtx		lock;
 
 	int			ring_ref;
 	netif_tx_front_ring_t	ring;
 	xen_intr_handle_t 	xen_intr_handle;
 
 	grant_ref_t		gref_head;
 	grant_ref_t		grant_ref[NET_TX_RING_SIZE + 1];
 
 	struct mbuf		*mbufs[NET_TX_RING_SIZE + 1];
 	int			mbufs_cnt;
 	struct buf_ring		*br;
 
 	struct taskqueue 	*tq;
 	struct task       	defrtask;
 
 	bool			full;
 
 	struct xn_tx_stats	stats;
 };
 
 struct netfront_info {
 	struct ifnet 		*xn_ifp;
 
 	struct mtx   		sc_lock;
 
 	u_int  num_queues;
 	struct netfront_rxq 	*rxq;
 	struct netfront_txq 	*txq;
 
 	u_int			carrier;
 	u_int			maxfrags;
 
 	device_t		xbdev;
 	uint8_t			mac[ETHER_ADDR_LEN];
 
 	int			xn_if_flags;
 
 	struct ifmedia		sc_media;
 
 	bool			xn_reset;
 };
 
 struct netfront_rx_info {
 	struct netif_rx_response rx;
 	struct netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX - 1];
 };
 
 #define XN_RX_LOCK(_q)         mtx_lock(&(_q)->lock)
 #define XN_RX_UNLOCK(_q)       mtx_unlock(&(_q)->lock)
 
 #define XN_TX_LOCK(_q)         mtx_lock(&(_q)->lock)
 #define XN_TX_TRYLOCK(_q)      mtx_trylock(&(_q)->lock)
 #define XN_TX_UNLOCK(_q)       mtx_unlock(&(_q)->lock)
 
 #define XN_LOCK(_sc)           mtx_lock(&(_sc)->sc_lock);
 #define XN_UNLOCK(_sc)         mtx_unlock(&(_sc)->sc_lock);
 
 #define XN_LOCK_ASSERT(_sc)    mtx_assert(&(_sc)->sc_lock, MA_OWNED);
 #define XN_RX_LOCK_ASSERT(_q)  mtx_assert(&(_q)->lock, MA_OWNED);
 #define XN_TX_LOCK_ASSERT(_q)  mtx_assert(&(_q)->lock, MA_OWNED);
 
 #define netfront_carrier_on(netif)	((netif)->carrier = 1)
 #define netfront_carrier_off(netif)	((netif)->carrier = 0)
 #define netfront_carrier_ok(netif)	((netif)->carrier)
 
 /* Access macros for acquiring freeing slots in xn_free_{tx,rx}_idxs[]. */
 
 static inline void
 add_id_to_freelist(struct mbuf **list, uintptr_t id)
 {
 
 	KASSERT(id != 0,
 		("%s: the head item (0) must always be free.", __func__));
 	list[id] = list[0];
 	list[0]  = (struct mbuf *)id;
 }
 
 static inline unsigned short
 get_id_from_freelist(struct mbuf **list)
 {
 	uintptr_t id;
 
 	id = (uintptr_t)list[0];
 	KASSERT(id != 0,
 		("%s: the head item (0) must always remain free.", __func__));
 	list[0] = list[id];
 	return (id);
 }
 
 static inline int
 xn_rxidx(RING_IDX idx)
 {
 
 	return idx & (NET_RX_RING_SIZE - 1);
 }
 
 static inline struct mbuf *
 xn_get_rx_mbuf(struct netfront_rxq *rxq, RING_IDX ri)
 {
 	int i;
 	struct mbuf *m;
 
 	i = xn_rxidx(ri);
 	m = rxq->mbufs[i];
 	rxq->mbufs[i] = NULL;
 	return (m);
 }
 
 static inline grant_ref_t
 xn_get_rx_ref(struct netfront_rxq *rxq, RING_IDX ri)
 {
 	int i = xn_rxidx(ri);
 	grant_ref_t ref = rxq->grant_ref[i];
 
 	KASSERT(ref != GRANT_REF_INVALID, ("Invalid grant reference!\n"));
 	rxq->grant_ref[i] = GRANT_REF_INVALID;
 	return (ref);
 }
 
 #define IPRINTK(fmt, args...) \
     printf("[XEN] " fmt, ##args)
 #ifdef INVARIANTS
 #define WPRINTK(fmt, args...) \
     printf("[XEN] " fmt, ##args)
 #else
 #define WPRINTK(fmt, args...)
 #endif
 #ifdef DEBUG
 #define DPRINTK(fmt, args...) \
     printf("[XEN] %s: " fmt, __func__, ##args)
 #else
 #define DPRINTK(fmt, args...)
 #endif
 
 /**
  * Read the 'mac' node at the given device's node in the store, and parse that
  * as colon-separated octets, placing result the given mac array.  mac must be
  * a preallocated array of length ETH_ALEN (as declared in linux/if_ether.h).
  * Return 0 on success, or errno on error.
  */
 static int
 xen_net_read_mac(device_t dev, uint8_t mac[])
 {
 	int error, i;
 	char *s, *e, *macstr;
 	const char *path;
 
 	path = xenbus_get_node(dev);
 	error = xs_read(XST_NIL, path, "mac", NULL, (void **) &macstr);
 	if (error == ENOENT) {
 		/*
 		 * Deal with missing mac XenStore nodes on devices with
 		 * HVM emulation (the 'ioemu' configuration attribute)
 		 * enabled.
 		 *
 		 * The HVM emulator may execute in a stub device model
 		 * domain which lacks the permission, only given to Dom0,
 		 * to update the guest's XenStore tree.  For this reason,
 		 * the HVM emulator doesn't even attempt to write the
 		 * front-side mac node, even when operating in Dom0.
 		 * However, there should always be a mac listed in the
 		 * backend tree.  Fallback to this version if our query
 		 * of the front side XenStore location doesn't find
 		 * anything.
 		 */
 		path = xenbus_get_otherend_path(dev);
 		error = xs_read(XST_NIL, path, "mac", NULL, (void **) &macstr);
 	}
 	if (error != 0) {
 		xenbus_dev_fatal(dev, error, "parsing %s/mac", path);
 		return (error);
 	}
 
 	s = macstr;
 	for (i = 0; i < ETHER_ADDR_LEN; i++) {
 		mac[i] = strtoul(s, &e, 16);
 		if (s == e || (e[0] != ':' && e[0] != 0)) {
 			free(macstr, M_XENBUS);
 			return (ENOENT);
 		}
 		s = &e[1];
 	}
 	free(macstr, M_XENBUS);
 	return (0);
 }
 
 /**
  * Entry point to this code when a new device is created.  Allocate the basic
  * structures and the ring buffers for communication with the backend, and
  * inform the backend of the appropriate details for those.  Switch to
  * Connected state.
  */
 static int
 netfront_probe(device_t dev)
 {
 
 	if (xen_hvm_domain() && xen_disable_pv_nics != 0)
 		return (ENXIO);
 
 	if (!strcmp(xenbus_get_type(dev), "vif")) {
 		device_set_desc(dev, "Virtual Network Interface");
 		return (0);
 	}
 
 	return (ENXIO);
 }
 
 static int
 netfront_attach(device_t dev)
 {
 	int err;
 
 	err = create_netdev(dev);
 	if (err != 0) {
 		xenbus_dev_fatal(dev, err, "creating netdev");
 		return (err);
 	}
 
 	SYSCTL_ADD_INT(device_get_sysctl_ctx(dev),
 	    SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
 	    OID_AUTO, "enable_lro", CTLFLAG_RW,
 	    &xn_enable_lro, 0, "Large Receive Offload");
 
 	SYSCTL_ADD_ULONG(device_get_sysctl_ctx(dev),
 	    SYSCTL_CHILDREN(device_get_sysctl_tree(dev)),
 	    OID_AUTO, "num_queues", CTLFLAG_RD,
 	    &xn_num_queues, "Number of pairs of queues");
 
 	return (0);
 }
 
 static int
 netfront_suspend(device_t dev)
 {
 	struct netfront_info *np = device_get_softc(dev);
 	u_int i;
 
 	for (i = 0; i < np->num_queues; i++) {
 		XN_RX_LOCK(&np->rxq[i]);
 		XN_TX_LOCK(&np->txq[i]);
 	}
 	netfront_carrier_off(np);
 	for (i = 0; i < np->num_queues; i++) {
 		XN_RX_UNLOCK(&np->rxq[i]);
 		XN_TX_UNLOCK(&np->txq[i]);
 	}
 	return (0);
 }
 
 /**
  * We are reconnecting to the backend, due to a suspend/resume, or a backend
  * driver restart.  We tear down our netif structure and recreate it, but
  * leave the device-layer structures intact so that this is transparent to the
  * rest of the kernel.
  */
 static int
 netfront_resume(device_t dev)
 {
 	struct netfront_info *info = device_get_softc(dev);
 
 	netif_disconnect_backend(info);
 	return (0);
 }
 
 static int
 write_queue_xenstore_keys(device_t dev,
     struct netfront_rxq *rxq,
     struct netfront_txq *txq,
     struct xs_transaction *xst, bool hierarchy)
 {
 	int err;
 	const char *message;
 	const char *node = xenbus_get_node(dev);
 	char *path;
 	size_t path_size;
 
 	KASSERT(rxq->id == txq->id, ("Mismatch between RX and TX queue ids"));
 	/* Split event channel support is not yet there. */
 	KASSERT(rxq->xen_intr_handle == txq->xen_intr_handle,
 	    ("Split event channels are not supported"));
 
 	if (hierarchy) {
 		path_size = strlen(node) + 10;
 		path = malloc(path_size, M_DEVBUF, M_WAITOK|M_ZERO);
 		snprintf(path, path_size, "%s/queue-%u", node, rxq->id);
 	} else {
 		path_size = strlen(node) + 1;
 		path = malloc(path_size, M_DEVBUF, M_WAITOK|M_ZERO);
 		snprintf(path, path_size, "%s", node);
 	}
 
 	err = xs_printf(*xst, path, "tx-ring-ref","%u", txq->ring_ref);
 	if (err != 0) {
 		message = "writing tx ring-ref";
 		goto error;
 	}
 	err = xs_printf(*xst, path, "rx-ring-ref","%u", rxq->ring_ref);
 	if (err != 0) {
 		message = "writing rx ring-ref";
 		goto error;
 	}
 	err = xs_printf(*xst, path, "event-channel", "%u",
 	    xen_intr_port(rxq->xen_intr_handle));
 	if (err != 0) {
 		message = "writing event-channel";
 		goto error;
 	}
 
 	free(path, M_DEVBUF);
 
 	return (0);
 
 error:
 	free(path, M_DEVBUF);
 	xenbus_dev_fatal(dev, err, "%s", message);
 
 	return (err);
 }
 
 /* Common code used when first setting up, and when resuming. */
 static int
 talk_to_backend(device_t dev, struct netfront_info *info)
 {
 	const char *message;
 	struct xs_transaction xst;
 	const char *node = xenbus_get_node(dev);
 	int err;
 	unsigned long num_queues, max_queues = 0;
 	unsigned int i;
 
 	err = xen_net_read_mac(dev, info->mac);
 	if (err != 0) {
 		xenbus_dev_fatal(dev, err, "parsing %s/mac", node);
 		goto out;
 	}
 
 	err = xs_scanf(XST_NIL, xenbus_get_otherend_path(info->xbdev),
 	    "multi-queue-max-queues", NULL, "%lu", &max_queues);
 	if (err != 0)
 		max_queues = 1;
 	num_queues = xn_num_queues;
 	if (num_queues > max_queues)
 		num_queues = max_queues;
 
 	err = setup_device(dev, info, num_queues);
 	if (err != 0)
 		goto out;
 
  again:
 	err = xs_transaction_start(&xst);
 	if (err != 0) {
 		xenbus_dev_fatal(dev, err, "starting transaction");
 		goto free;
 	}
 
 	if (info->num_queues == 1) {
 		err = write_queue_xenstore_keys(dev, &info->rxq[0],
 		    &info->txq[0], &xst, false);
 		if (err != 0)
 			goto abort_transaction_no_def_error;
 	} else {
 		err = xs_printf(xst, node, "multi-queue-num-queues",
 		    "%u", info->num_queues);
 		if (err != 0) {
 			message = "writing multi-queue-num-queues";
 			goto abort_transaction;
 		}
 
 		for (i = 0; i < info->num_queues; i++) {
 			err = write_queue_xenstore_keys(dev, &info->rxq[i],
 			    &info->txq[i], &xst, true);
 			if (err != 0)
 				goto abort_transaction_no_def_error;
 		}
 	}
 
 	err = xs_printf(xst, node, "request-rx-copy", "%u", 1);
 	if (err != 0) {
 		message = "writing request-rx-copy";
 		goto abort_transaction;
 	}
 	err = xs_printf(xst, node, "feature-rx-notify", "%d", 1);
 	if (err != 0) {
 		message = "writing feature-rx-notify";
 		goto abort_transaction;
 	}
 	err = xs_printf(xst, node, "feature-sg", "%d", 1);
 	if (err != 0) {
 		message = "writing feature-sg";
 		goto abort_transaction;
 	}
 	if ((info->xn_ifp->if_capenable & IFCAP_LRO) != 0) {
 		err = xs_printf(xst, node, "feature-gso-tcpv4", "%d", 1);
 		if (err != 0) {
 			message = "writing feature-gso-tcpv4";
 			goto abort_transaction;
 		}
 	}
 	if ((info->xn_ifp->if_capenable & IFCAP_RXCSUM) == 0) {
 		err = xs_printf(xst, node, "feature-no-csum-offload", "%d", 1);
 		if (err != 0) {
 			message = "writing feature-no-csum-offload";
 			goto abort_transaction;
 		}
 	}
 
 	err = xs_transaction_end(xst, 0);
 	if (err != 0) {
 		if (err == EAGAIN)
 			goto again;
 		xenbus_dev_fatal(dev, err, "completing transaction");
 		goto free;
 	}
 
 	return 0;
 
  abort_transaction:
 	xenbus_dev_fatal(dev, err, "%s", message);
  abort_transaction_no_def_error:
 	xs_transaction_end(xst, 1);
  free:
 	netif_free(info);
  out:
 	return (err);
 }
 
 static void
 xn_rxq_intr(struct netfront_rxq *rxq)
 {
 
 	XN_RX_LOCK(rxq);
 	xn_rxeof(rxq);
 	XN_RX_UNLOCK(rxq);
 }
 
 static void
 xn_txq_start(struct netfront_txq *txq)
 {
 	struct netfront_info *np = txq->info;
 	struct ifnet *ifp = np->xn_ifp;
 
 	XN_TX_LOCK_ASSERT(txq);
 	if (!drbr_empty(ifp, txq->br))
 		xn_txq_mq_start_locked(txq, NULL);
 }
 
 static void
 xn_txq_intr(struct netfront_txq *txq)
 {
 
 	XN_TX_LOCK(txq);
 	if (RING_HAS_UNCONSUMED_RESPONSES(&txq->ring))
 		xn_txeof(txq);
 	xn_txq_start(txq);
 	XN_TX_UNLOCK(txq);
 }
 
 static void
 xn_txq_tq_deferred(void *xtxq, int pending)
 {
 	struct netfront_txq *txq = xtxq;
 
 	XN_TX_LOCK(txq);
 	xn_txq_start(txq);
 	XN_TX_UNLOCK(txq);
 }
 
 static void
 disconnect_rxq(struct netfront_rxq *rxq)
 {
 
 	xn_release_rx_bufs(rxq);
 	gnttab_free_grant_references(rxq->gref_head);
 	gnttab_end_foreign_access(rxq->ring_ref, NULL);
 	/*
 	 * No split event channel support at the moment, handle will
 	 * be unbound in tx. So no need to call xen_intr_unbind here,
 	 * but we do want to reset the handler to 0.
 	 */
 	rxq->xen_intr_handle = 0;
 }
 
 static void
 destroy_rxq(struct netfront_rxq *rxq)
 {
 
 	callout_drain(&rxq->rx_refill);
 	free(rxq->ring.sring, M_DEVBUF);
 }
 
 static void
 destroy_rxqs(struct netfront_info *np)
 {
 	int i;
 
 	for (i = 0; i < np->num_queues; i++)
 		destroy_rxq(&np->rxq[i]);
 
 	free(np->rxq, M_DEVBUF);
 	np->rxq = NULL;
 }
 
 static int
 setup_rxqs(device_t dev, struct netfront_info *info,
 	   unsigned long num_queues)
 {
 	int q, i;
 	int error;
 	netif_rx_sring_t *rxs;
 	struct netfront_rxq *rxq;
 
 	info->rxq = malloc(sizeof(struct netfront_rxq) * num_queues,
 	    M_DEVBUF, M_WAITOK|M_ZERO);
 
 	for (q = 0; q < num_queues; q++) {
 		rxq = &info->rxq[q];
 
 		rxq->id = q;
 		rxq->info = info;
 		rxq->ring_ref = GRANT_REF_INVALID;
 		rxq->ring.sring = NULL;
 		snprintf(rxq->name, XN_QUEUE_NAME_LEN, "xnrx_%u", q);
 		mtx_init(&rxq->lock, rxq->name, "netfront receive lock",
 		    MTX_DEF);
 
 		for (i = 0; i <= NET_RX_RING_SIZE; i++) {
 			rxq->mbufs[i] = NULL;
 			rxq->grant_ref[i] = GRANT_REF_INVALID;
 		}
 
 		/* Start resources allocation */
 
 		if (gnttab_alloc_grant_references(NET_RX_RING_SIZE,
 		    &rxq->gref_head) != 0) {
 			device_printf(dev, "allocating rx gref");
 			error = ENOMEM;
 			goto fail;
 		}
 
 		rxs = (netif_rx_sring_t *)malloc(PAGE_SIZE, M_DEVBUF,
 		    M_WAITOK|M_ZERO);
 		SHARED_RING_INIT(rxs);
 		FRONT_RING_INIT(&rxq->ring, rxs, PAGE_SIZE);
 
 		error = xenbus_grant_ring(dev, virt_to_mfn(rxs),
 		    &rxq->ring_ref);
 		if (error != 0) {
 			device_printf(dev, "granting rx ring page");
 			goto fail_grant_ring;
 		}
 
 		callout_init(&rxq->rx_refill, 1);
 	}
 
 	return (0);
 
 fail_grant_ring:
 	gnttab_free_grant_references(rxq->gref_head);
 	free(rxq->ring.sring, M_DEVBUF);
 fail:
 	for (; q >= 0; q--) {
 		disconnect_rxq(&info->rxq[q]);
 		destroy_rxq(&info->rxq[q]);
 	}
 
 	free(info->rxq, M_DEVBUF);
 	return (error);
 }
 
 static void
 disconnect_txq(struct netfront_txq *txq)
 {
 
 	xn_release_tx_bufs(txq);
 	gnttab_free_grant_references(txq->gref_head);
 	gnttab_end_foreign_access(txq->ring_ref, NULL);
 	xen_intr_unbind(&txq->xen_intr_handle);
 }
 
 static void
 destroy_txq(struct netfront_txq *txq)
 {
 
 	free(txq->ring.sring, M_DEVBUF);
 	buf_ring_free(txq->br, M_DEVBUF);
 	taskqueue_drain_all(txq->tq);
 	taskqueue_free(txq->tq);
 }
 
 static void
 destroy_txqs(struct netfront_info *np)
 {
 	int i;
 
 	for (i = 0; i < np->num_queues; i++)
 		destroy_txq(&np->txq[i]);
 
 	free(np->txq, M_DEVBUF);
 	np->txq = NULL;
 }
 
 static int
 setup_txqs(device_t dev, struct netfront_info *info,
 	   unsigned long num_queues)
 {
 	int q, i;
 	int error;
 	netif_tx_sring_t *txs;
 	struct netfront_txq *txq;
 
 	info->txq = malloc(sizeof(struct netfront_txq) * num_queues,
 	    M_DEVBUF, M_WAITOK|M_ZERO);
 
 	for (q = 0; q < num_queues; q++) {
 		txq = &info->txq[q];
 
 		txq->id = q;
 		txq->info = info;
 
 		txq->ring_ref = GRANT_REF_INVALID;
 		txq->ring.sring = NULL;
 
 		snprintf(txq->name, XN_QUEUE_NAME_LEN, "xntx_%u", q);
 
 		mtx_init(&txq->lock, txq->name, "netfront transmit lock",
 		    MTX_DEF);
 
 		for (i = 0; i <= NET_TX_RING_SIZE; i++) {
 			txq->mbufs[i] = (void *) ((u_long) i+1);
 			txq->grant_ref[i] = GRANT_REF_INVALID;
 		}
 		txq->mbufs[NET_TX_RING_SIZE] = (void *)0;
 
 		/* Start resources allocation. */
 
 		if (gnttab_alloc_grant_references(NET_TX_RING_SIZE,
 		    &txq->gref_head) != 0) {
 			device_printf(dev, "failed to allocate tx grant refs\n");
 			error = ENOMEM;
 			goto fail;
 		}
 
 		txs = (netif_tx_sring_t *)malloc(PAGE_SIZE, M_DEVBUF,
 		    M_WAITOK|M_ZERO);
 		SHARED_RING_INIT(txs);
 		FRONT_RING_INIT(&txq->ring, txs, PAGE_SIZE);
 
 		error = xenbus_grant_ring(dev, virt_to_mfn(txs),
 		    &txq->ring_ref);
 		if (error != 0) {
 			device_printf(dev, "failed to grant tx ring\n");
 			goto fail_grant_ring;
 		}
 
 		txq->br = buf_ring_alloc(NET_TX_RING_SIZE, M_DEVBUF,
 		    M_WAITOK, &txq->lock);
 		TASK_INIT(&txq->defrtask, 0, xn_txq_tq_deferred, txq);
 
 		txq->tq = taskqueue_create(txq->name, M_WAITOK,
 		    taskqueue_thread_enqueue, &txq->tq);
 
 		error = taskqueue_start_threads(&txq->tq, 1, PI_NET,
 		    "%s txq %d", device_get_nameunit(dev), txq->id);
 		if (error != 0) {
 			device_printf(dev, "failed to start tx taskq %d\n",
 			    txq->id);
 			goto fail_start_thread;
 		}
 
 		error = xen_intr_alloc_and_bind_local_port(dev,
 		    xenbus_get_otherend_id(dev), /* filter */ NULL, xn_intr,
 		    &info->txq[q], INTR_TYPE_NET | INTR_MPSAFE | INTR_ENTROPY,
 		    &txq->xen_intr_handle);
 
 		if (error != 0) {
 			device_printf(dev, "xen_intr_alloc_and_bind_local_port failed\n");
 			goto fail_bind_port;
 		}
 	}
 
 	return (0);
 
 fail_bind_port:
 	taskqueue_drain_all(txq->tq);
 fail_start_thread:
 	buf_ring_free(txq->br, M_DEVBUF);
 	taskqueue_free(txq->tq);
 	gnttab_end_foreign_access(txq->ring_ref, NULL);
 fail_grant_ring:
 	gnttab_free_grant_references(txq->gref_head);
 	free(txq->ring.sring, M_DEVBUF);
 fail:
 	for (; q >= 0; q--) {
 		disconnect_txq(&info->txq[q]);
 		destroy_txq(&info->txq[q]);
 	}
 
 	free(info->txq, M_DEVBUF);
 	return (error);
 }
 
 static int
 setup_device(device_t dev, struct netfront_info *info,
     unsigned long num_queues)
 {
 	int error;
 	int q;
 
 	if (info->txq)
 		destroy_txqs(info);
 
 	if (info->rxq)
 		destroy_rxqs(info);
 
 	info->num_queues = 0;
 
 	error = setup_rxqs(dev, info, num_queues);
 	if (error != 0)
 		goto out;
 	error = setup_txqs(dev, info, num_queues);
 	if (error != 0)
 		goto out;
 
 	info->num_queues = num_queues;
 
 	/* No split event channel at the moment. */
 	for (q = 0; q < num_queues; q++)
 		info->rxq[q].xen_intr_handle = info->txq[q].xen_intr_handle;
 
 	return (0);
 
 out:
 	KASSERT(error != 0, ("Error path taken without providing an error code"));
 	return (error);
 }
 
 #ifdef INET
 /**
  * If this interface has an ipv4 address, send an arp for it. This
  * helps to get the network going again after migrating hosts.
  */
 static void
 netfront_send_fake_arp(device_t dev, struct netfront_info *info)
 {
 	struct ifnet *ifp;
 	struct ifaddr *ifa;
 
 	ifp = info->xn_ifp;
 	TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
 		if (ifa->ifa_addr->sa_family == AF_INET) {
 			arp_ifinit(ifp, ifa);
 		}
 	}
 }
 #endif
 
 /**
  * Callback received when the backend's state changes.
  */
 static void
 netfront_backend_changed(device_t dev, XenbusState newstate)
 {
 	struct netfront_info *sc = device_get_softc(dev);
 
 	DPRINTK("newstate=%d\n", newstate);
 
 	switch (newstate) {
 	case XenbusStateInitialising:
 	case XenbusStateInitialised:
 	case XenbusStateUnknown:
 	case XenbusStateReconfigured:
 	case XenbusStateReconfiguring:
 		break;
 	case XenbusStateInitWait:
 		if (xenbus_get_state(dev) != XenbusStateInitialising)
 			break;
 		if (xn_connect(sc) != 0)
 			break;
 		/* Switch to connected state before kicking the rings. */
 		xenbus_set_state(sc->xbdev, XenbusStateConnected);
 		xn_kick_rings(sc);
 		break;
 	case XenbusStateClosing:
 		xenbus_set_state(dev, XenbusStateClosed);
 		break;
 	case XenbusStateClosed:
 		if (sc->xn_reset) {
 			netif_disconnect_backend(sc);
 			xenbus_set_state(dev, XenbusStateInitialising);
 			sc->xn_reset = false;
 		}
 		break;
 	case XenbusStateConnected:
 #ifdef INET
 		netfront_send_fake_arp(dev, sc);
 #endif
 		break;
 	}
 }
 
 /**
  * \brief Verify that there is sufficient space in the Tx ring
  *        buffer for a maximally sized request to be enqueued.
  *
  * A transmit request requires a transmit descriptor for each packet
  * fragment, plus up to 2 entries for "options" (e.g. TSO).
  */
 static inline int
 xn_tx_slot_available(struct netfront_txq *txq)
 {
 
 	return (RING_FREE_REQUESTS(&txq->ring) > (MAX_TX_REQ_FRAGS + 2));
 }
 
 static void
 xn_release_tx_bufs(struct netfront_txq *txq)
 {
 	int i;
 
 	for (i = 1; i <= NET_TX_RING_SIZE; i++) {
 		struct mbuf *m;
 
 		m = txq->mbufs[i];
 
 		/*
 		 * We assume that no kernel addresses are
 		 * less than NET_TX_RING_SIZE.  Any entry
 		 * in the table that is below this number
 		 * must be an index from free-list tracking.
 		 */
 		if (((uintptr_t)m) <= NET_TX_RING_SIZE)
 			continue;
 		gnttab_end_foreign_access_ref(txq->grant_ref[i]);
 		gnttab_release_grant_reference(&txq->gref_head,
 		    txq->grant_ref[i]);
 		txq->grant_ref[i] = GRANT_REF_INVALID;
 		add_id_to_freelist(txq->mbufs, i);
 		txq->mbufs_cnt--;
 		if (txq->mbufs_cnt < 0) {
 			panic("%s: tx_chain_cnt must be >= 0", __func__);
 		}
 		m_free(m);
 	}
 }
 
 static struct mbuf *
 xn_alloc_one_rx_buffer(struct netfront_rxq *rxq)
 {
 	struct mbuf *m;
 
 	m = m_getjcl(M_NOWAIT, MT_DATA, M_PKTHDR, MJUMPAGESIZE);
 	if (m == NULL)
 		return NULL;
 	m->m_len = m->m_pkthdr.len = MJUMPAGESIZE;
 
 	return (m);
 }
 
 static void
 xn_alloc_rx_buffers(struct netfront_rxq *rxq)
 {
 	RING_IDX req_prod;
 	int notify;
 
 	XN_RX_LOCK_ASSERT(rxq);
 
 	if (__predict_false(rxq->info->carrier == 0))
 		return;
 
 	for (req_prod = rxq->ring.req_prod_pvt;
 	     req_prod - rxq->ring.rsp_cons < NET_RX_RING_SIZE;
 	     req_prod++) {
 		struct mbuf *m;
 		unsigned short id;
 		grant_ref_t ref;
 		struct netif_rx_request *req;
 		unsigned long pfn;
 
 		m = xn_alloc_one_rx_buffer(rxq);
 		if (m == NULL)
 			break;
 
 		id = xn_rxidx(req_prod);
 
 		KASSERT(rxq->mbufs[id] == NULL, ("non-NULL xn_rx_chain"));
 		rxq->mbufs[id] = m;
 
 		ref = gnttab_claim_grant_reference(&rxq->gref_head);
 		KASSERT(ref != GNTTAB_LIST_END,
 		    ("reserved grant references exhuasted"));
 		rxq->grant_ref[id] = ref;
 
 		pfn = atop(vtophys(mtod(m, vm_offset_t)));
 		req = RING_GET_REQUEST(&rxq->ring, req_prod);
 
 		gnttab_grant_foreign_access_ref(ref,
 		    xenbus_get_otherend_id(rxq->info->xbdev), pfn, 0);
 		req->id = id;
 		req->gref = ref;
 	}
 
 	rxq->ring.req_prod_pvt = req_prod;
 
 	/* Not enough requests? Try again later. */
 	if (req_prod - rxq->ring.rsp_cons < NET_RX_SLOTS_MIN) {
 		callout_reset_curcpu(&rxq->rx_refill, hz/10,
 		    xn_alloc_rx_buffers_callout, rxq);
 		return;
 	}
 
 	wmb();		/* barrier so backend seens requests */
 
 	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&rxq->ring, notify);
 	if (notify)
 		xen_intr_signal(rxq->xen_intr_handle);
 }
 
 static void xn_alloc_rx_buffers_callout(void *arg)
 {
 	struct netfront_rxq *rxq;
 
 	rxq = (struct netfront_rxq *)arg;
 	XN_RX_LOCK(rxq);
 	xn_alloc_rx_buffers(rxq);
 	XN_RX_UNLOCK(rxq);
 }
 
 static void
 xn_release_rx_bufs(struct netfront_rxq *rxq)
 {
 	int i,  ref;
 	struct mbuf *m;
 
 	for (i = 0; i < NET_RX_RING_SIZE; i++) {
 		m = rxq->mbufs[i];
 
 		if (m == NULL)
 			continue;
 
 		ref = rxq->grant_ref[i];
 		if (ref == GRANT_REF_INVALID)
 			continue;
 
 		gnttab_end_foreign_access_ref(ref);
 		gnttab_release_grant_reference(&rxq->gref_head, ref);
 		rxq->mbufs[i] = NULL;
 		rxq->grant_ref[i] = GRANT_REF_INVALID;
 		m_freem(m);
 	}
 }
 
 static void
 xn_rxeof(struct netfront_rxq *rxq)
 {
 	struct ifnet *ifp;
 	struct netfront_info *np = rxq->info;
 #if (defined(INET) || defined(INET6))
 	struct lro_ctrl *lro = &rxq->lro;
 #endif
 	struct netfront_rx_info rinfo;
 	struct netif_rx_response *rx = &rinfo.rx;
 	struct netif_extra_info *extras = rinfo.extras;
 	RING_IDX i, rp;
 	struct mbuf *m;
 	struct mbufq mbufq_rxq, mbufq_errq;
 	int err, work_to_do;
 
 	do {
 		XN_RX_LOCK_ASSERT(rxq);
 		if (!netfront_carrier_ok(np))
 			return;
 
 		/* XXX: there should be some sane limit. */
 		mbufq_init(&mbufq_errq, INT_MAX);
 		mbufq_init(&mbufq_rxq, INT_MAX);
 
 		ifp = np->xn_ifp;
 
 		rp = rxq->ring.sring->rsp_prod;
 		rmb();	/* Ensure we see queued responses up to 'rp'. */
 
 		i = rxq->ring.rsp_cons;
 		while ((i != rp)) {
 			memcpy(rx, RING_GET_RESPONSE(&rxq->ring, i), sizeof(*rx));
 			memset(extras, 0, sizeof(rinfo.extras));
 
 			m = NULL;
 			err = xn_get_responses(rxq, &rinfo, rp, &i, &m);
 
 			if (__predict_false(err)) {
 				if (m)
 					(void )mbufq_enqueue(&mbufq_errq, m);
 				rxq->stats.rx_errors++;
 				continue;
 			}
 
 			m->m_pkthdr.rcvif = ifp;
 			if ( rx->flags & NETRXF_data_validated ) {
 				/* Tell the stack the checksums are okay */
 				/*
 				 * XXX this isn't necessarily the case - need to add
 				 * check
 				 */
 
 				m->m_pkthdr.csum_flags |=
 					(CSUM_IP_CHECKED | CSUM_IP_VALID | CSUM_DATA_VALID
 					    | CSUM_PSEUDO_HDR);
 				m->m_pkthdr.csum_data = 0xffff;
 			}
 			if ((rx->flags & NETRXF_extra_info) != 0 &&
 			    (extras[XEN_NETIF_EXTRA_TYPE_GSO - 1].type ==
 			    XEN_NETIF_EXTRA_TYPE_GSO)) {
 				m->m_pkthdr.tso_segsz =
 				extras[XEN_NETIF_EXTRA_TYPE_GSO - 1].u.gso.size;
 				m->m_pkthdr.csum_flags |= CSUM_TSO;
 			}
 
 			rxq->stats.rx_packets++;
 			rxq->stats.rx_bytes += m->m_pkthdr.len;
 
 			(void )mbufq_enqueue(&mbufq_rxq, m);
 			rxq->ring.rsp_cons = i;
 		}
 
 		mbufq_drain(&mbufq_errq);
 
 		/*
 		 * Process all the mbufs after the remapping is complete.
 		 * Break the mbuf chain first though.
 		 */
 		while ((m = mbufq_dequeue(&mbufq_rxq)) != NULL) {
 			if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1);
 
 			/* XXX: Do we really need to drop the rx lock? */
 			XN_RX_UNLOCK(rxq);
 #if (defined(INET) || defined(INET6))
 			/* Use LRO if possible */
 			if ((ifp->if_capenable & IFCAP_LRO) == 0 ||
 			    lro->lro_cnt == 0 || tcp_lro_rx(lro, m, 0)) {
 				/*
 				 * If LRO fails, pass up to the stack
 				 * directly.
 				 */
 				(*ifp->if_input)(ifp, m);
 			}
 #else
 			(*ifp->if_input)(ifp, m);
 #endif
 
 			XN_RX_LOCK(rxq);
 		}
 
 		rxq->ring.rsp_cons = i;
 
 #if (defined(INET) || defined(INET6))
 		/*
 		 * Flush any outstanding LRO work
 		 */
 		tcp_lro_flush_all(lro);
 #endif
 
 		xn_alloc_rx_buffers(rxq);
 
 		RING_FINAL_CHECK_FOR_RESPONSES(&rxq->ring, work_to_do);
 	} while (work_to_do);
 }
 
 static void
 xn_txeof(struct netfront_txq *txq)
 {
 	RING_IDX i, prod;
 	unsigned short id;
 	struct ifnet *ifp;
 	netif_tx_response_t *txr;
 	struct mbuf *m;
 	struct netfront_info *np = txq->info;
 
 	XN_TX_LOCK_ASSERT(txq);
 
 	if (!netfront_carrier_ok(np))
 		return;
 
 	ifp = np->xn_ifp;
 
 	do {
 		prod = txq->ring.sring->rsp_prod;
 		rmb(); /* Ensure we see responses up to 'rp'. */
 
 		for (i = txq->ring.rsp_cons; i != prod; i++) {
 			txr = RING_GET_RESPONSE(&txq->ring, i);
 			if (txr->status == NETIF_RSP_NULL)
 				continue;
 
 			if (txr->status != NETIF_RSP_OKAY) {
 				printf("%s: WARNING: response is %d!\n",
 				       __func__, txr->status);
 			}
 			id = txr->id;
 			m = txq->mbufs[id];
 			KASSERT(m != NULL, ("mbuf not found in chain"));
 			KASSERT((uintptr_t)m > NET_TX_RING_SIZE,
 				("mbuf already on the free list, but we're "
 				"trying to free it again!"));
 			M_ASSERTVALID(m);
 
 			/*
 			 * Increment packet count if this is the last
 			 * mbuf of the chain.
 			 */
 			if (!m->m_next)
 				if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1);
 			if (__predict_false(gnttab_query_foreign_access(
 			    txq->grant_ref[id]) != 0)) {
 				panic("%s: grant id %u still in use by the "
 				    "backend", __func__, id);
 			}
 			gnttab_end_foreign_access_ref(txq->grant_ref[id]);
 			gnttab_release_grant_reference(
 				&txq->gref_head, txq->grant_ref[id]);
 			txq->grant_ref[id] = GRANT_REF_INVALID;
 
 			txq->mbufs[id] = NULL;
 			add_id_to_freelist(txq->mbufs, id);
 			txq->mbufs_cnt--;
 			m_free(m);
 			/* Only mark the txq active if we've freed up at least one slot to try */
 			ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 		}
 		txq->ring.rsp_cons = prod;
 
 		/*
 		 * Set a new event, then check for race with update of
 		 * tx_cons. Note that it is essential to schedule a
 		 * callback, no matter how few buffers are pending. Even if
 		 * there is space in the transmit ring, higher layers may
 		 * be blocked because too much data is outstanding: in such
 		 * cases notification from Xen is likely to be the only kick
 		 * that we'll get.
 		 */
 		txq->ring.sring->rsp_event =
 		    prod + ((txq->ring.sring->req_prod - prod) >> 1) + 1;
 
 		mb();
 	} while (prod != txq->ring.sring->rsp_prod);
 
 	if (txq->full &&
 	    ((txq->ring.sring->req_prod - prod) < NET_TX_RING_SIZE)) {
 		txq->full = false;
 		xn_txq_start(txq);
 	}
 }
 
 static void
 xn_intr(void *xsc)
 {
 	struct netfront_txq *txq = xsc;
 	struct netfront_info *np = txq->info;
 	struct netfront_rxq *rxq = &np->rxq[txq->id];
 
 	/* kick both tx and rx */
 	xn_rxq_intr(rxq);
 	xn_txq_intr(txq);
 }
 
 static void
 xn_move_rx_slot(struct netfront_rxq *rxq, struct mbuf *m,
     grant_ref_t ref)
 {
 	int new = xn_rxidx(rxq->ring.req_prod_pvt);
 
 	KASSERT(rxq->mbufs[new] == NULL, ("mbufs != NULL"));
 	rxq->mbufs[new] = m;
 	rxq->grant_ref[new] = ref;
 	RING_GET_REQUEST(&rxq->ring, rxq->ring.req_prod_pvt)->id = new;
 	RING_GET_REQUEST(&rxq->ring, rxq->ring.req_prod_pvt)->gref = ref;
 	rxq->ring.req_prod_pvt++;
 }
 
 static int
 xn_get_extras(struct netfront_rxq *rxq,
     struct netif_extra_info *extras, RING_IDX rp, RING_IDX *cons)
 {
 	struct netif_extra_info *extra;
 
 	int err = 0;
 
 	do {
 		struct mbuf *m;
 		grant_ref_t ref;
 
 		if (__predict_false(*cons + 1 == rp)) {
 			err = EINVAL;
 			break;
 		}
 
 		extra = (struct netif_extra_info *)
 		RING_GET_RESPONSE(&rxq->ring, ++(*cons));
 
 		if (__predict_false(!extra->type ||
 			extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) {
 			err = EINVAL;
 		} else {
 			memcpy(&extras[extra->type - 1], extra, sizeof(*extra));
 		}
 
 		m = xn_get_rx_mbuf(rxq, *cons);
 		ref = xn_get_rx_ref(rxq,  *cons);
 		xn_move_rx_slot(rxq, m, ref);
 	} while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE);
 
 	return err;
 }
 
 static int
 xn_get_responses(struct netfront_rxq *rxq,
     struct netfront_rx_info *rinfo, RING_IDX rp, RING_IDX *cons,
     struct mbuf  **list)
 {
 	struct netif_rx_response *rx = &rinfo->rx;
 	struct netif_extra_info *extras = rinfo->extras;
 	struct mbuf *m, *m0, *m_prev;
 	grant_ref_t ref = xn_get_rx_ref(rxq, *cons);
 	RING_IDX ref_cons = *cons;
 	int frags = 1;
 	int err = 0;
 	u_long ret;
 
 	m0 = m = m_prev = xn_get_rx_mbuf(rxq, *cons);
 
 	if (rx->flags & NETRXF_extra_info) {
 		err = xn_get_extras(rxq, extras, rp, cons);
 	}
 
 	if (m0 != NULL) {
 		m0->m_pkthdr.len = 0;
 		m0->m_next = NULL;
 	}
 
 	for (;;) {
 #if 0
 		DPRINTK("rx->status=%hd rx->offset=%hu frags=%u\n",
 			rx->status, rx->offset, frags);
 #endif
 		if (__predict_false(rx->status < 0 ||
 			rx->offset + rx->status > PAGE_SIZE)) {
 
 			xn_move_rx_slot(rxq, m, ref);
 			if (m0 == m)
 				m0 = NULL;
 			m = NULL;
 			err = EINVAL;
 			goto next_skip_queue;
 		}
 
 		/*
 		 * This definitely indicates a bug, either in this driver or in
 		 * the backend driver. In future this should flag the bad
 		 * situation to the system controller to reboot the backed.
 		 */
 		if (ref == GRANT_REF_INVALID) {
 			printf("%s: Bad rx response id %d.\n", __func__, rx->id);
 			err = EINVAL;
 			goto next;
 		}
 
 		ret = gnttab_end_foreign_access_ref(ref);
 		KASSERT(ret, ("Unable to end access to grant references"));
 
 		gnttab_release_grant_reference(&rxq->gref_head, ref);
 
 next:
 		if (m == NULL)
 			break;
 
 		m->m_len = rx->status;
 		m->m_data += rx->offset;
 		m0->m_pkthdr.len += rx->status;
 
 next_skip_queue:
 		if (!(rx->flags & NETRXF_more_data))
 			break;
 
 		if (*cons + frags == rp) {
 			if (net_ratelimit())
 				WPRINTK("Need more frags\n");
 			err = ENOENT;
 			printf("%s: cons %u frags %u rp %u, not enough frags\n",
 			       __func__, *cons, frags, rp);
 			break;
 		}
 		/*
 		 * Note that m can be NULL, if rx->status < 0 or if
 		 * rx->offset + rx->status > PAGE_SIZE above.
 		 */
 		m_prev = m;
 
 		rx = RING_GET_RESPONSE(&rxq->ring, *cons + frags);
 		m = xn_get_rx_mbuf(rxq, *cons + frags);
 
 		/*
 		 * m_prev == NULL can happen if rx->status < 0 or if
 		 * rx->offset + * rx->status > PAGE_SIZE above.
 		 */
 		if (m_prev != NULL)
 			m_prev->m_next = m;
 
 		/*
 		 * m0 can be NULL if rx->status < 0 or if * rx->offset +
 		 * rx->status > PAGE_SIZE above.
 		 */
 		if (m0 == NULL)
 			m0 = m;
 		m->m_next = NULL;
 		ref = xn_get_rx_ref(rxq, *cons + frags);
 		ref_cons = *cons + frags;
 		frags++;
 	}
 	*list = m0;
 	*cons += frags;
 
 	return (err);
 }
 
 /**
  * \brief Count the number of fragments in an mbuf chain.
  *
  * Surprisingly, there isn't an M* macro for this.
  */
 static inline int
 xn_count_frags(struct mbuf *m)
 {
 	int nfrags;
 
 	for (nfrags = 0; m != NULL; m = m->m_next)
 		nfrags++;
 
 	return (nfrags);
 }
 
 /**
  * Given an mbuf chain, make sure we have enough room and then push
  * it onto the transmit ring.
  */
 static int
 xn_assemble_tx_request(struct netfront_txq *txq, struct mbuf *m_head)
 {
 	struct mbuf *m;
 	struct netfront_info *np = txq->info;
 	struct ifnet *ifp = np->xn_ifp;
 	u_int nfrags;
 	int otherend_id;
 
 	/**
 	 * Defragment the mbuf if necessary.
 	 */
 	nfrags = xn_count_frags(m_head);
 
 	/*
 	 * Check to see whether this request is longer than netback
 	 * can handle, and try to defrag it.
 	 */
 	/**
 	 * It is a bit lame, but the netback driver in Linux can't
 	 * deal with nfrags > MAX_TX_REQ_FRAGS, which is a quirk of
 	 * the Linux network stack.
 	 */
 	if (nfrags > np->maxfrags) {
 		m = m_defrag(m_head, M_NOWAIT);
 		if (!m) {
 			/*
 			 * Defrag failed, so free the mbuf and
 			 * therefore drop the packet.
 			 */
 			m_freem(m_head);
 			return (EMSGSIZE);
 		}
 		m_head = m;
 	}
 
 	/* Determine how many fragments now exist */
 	nfrags = xn_count_frags(m_head);
 
 	/*
 	 * Check to see whether the defragmented packet has too many
 	 * segments for the Linux netback driver.
 	 */
 	/**
 	 * The FreeBSD TCP stack, with TSO enabled, can produce a chain
 	 * of mbufs longer than Linux can handle.  Make sure we don't
 	 * pass a too-long chain over to the other side by dropping the
 	 * packet.  It doesn't look like there is currently a way to
 	 * tell the TCP stack to generate a shorter chain of packets.
 	 */
 	if (nfrags > MAX_TX_REQ_FRAGS) {
 #ifdef DEBUG
 		printf("%s: nfrags %d > MAX_TX_REQ_FRAGS %d, netback "
 		       "won't be able to handle it, dropping\n",
 		       __func__, nfrags, MAX_TX_REQ_FRAGS);
 #endif
 		m_freem(m_head);
 		return (EMSGSIZE);
 	}
 
 	/*
 	 * This check should be redundant.  We've already verified that we
 	 * have enough slots in the ring to handle a packet of maximum
 	 * size, and that our packet is less than the maximum size.  Keep
 	 * it in here as an assert for now just to make certain that
 	 * chain_cnt is accurate.
 	 */
 	KASSERT((txq->mbufs_cnt + nfrags) <= NET_TX_RING_SIZE,
 		("%s: chain_cnt (%d) + nfrags (%d) > NET_TX_RING_SIZE "
 		 "(%d)!", __func__, (int) txq->mbufs_cnt,
                     (int) nfrags, (int) NET_TX_RING_SIZE));
 
 	/*
 	 * Start packing the mbufs in this chain into
 	 * the fragment pointers. Stop when we run out
 	 * of fragments or hit the end of the mbuf chain.
 	 */
 	m = m_head;
 	otherend_id = xenbus_get_otherend_id(np->xbdev);
 	for (m = m_head; m; m = m->m_next) {
 		netif_tx_request_t *tx;
 		uintptr_t id;
 		grant_ref_t ref;
 		u_long mfn; /* XXX Wrong type? */
 
 		tx = RING_GET_REQUEST(&txq->ring, txq->ring.req_prod_pvt);
 		id = get_id_from_freelist(txq->mbufs);
 		if (id == 0)
 			panic("%s: was allocated the freelist head!\n",
 			    __func__);
 		txq->mbufs_cnt++;
 		if (txq->mbufs_cnt > NET_TX_RING_SIZE)
 			panic("%s: tx_chain_cnt must be <= NET_TX_RING_SIZE\n",
 			    __func__);
 		txq->mbufs[id] = m;
 		tx->id = id;
 		ref = gnttab_claim_grant_reference(&txq->gref_head);
 		KASSERT((short)ref >= 0, ("Negative ref"));
 		mfn = virt_to_mfn(mtod(m, vm_offset_t));
 		gnttab_grant_foreign_access_ref(ref, otherend_id,
 		    mfn, GNTMAP_readonly);
 		tx->gref = txq->grant_ref[id] = ref;
 		tx->offset = mtod(m, vm_offset_t) & (PAGE_SIZE - 1);
 		tx->flags = 0;
 		if (m == m_head) {
 			/*
 			 * The first fragment has the entire packet
 			 * size, subsequent fragments have just the
 			 * fragment size. The backend works out the
 			 * true size of the first fragment by
 			 * subtracting the sizes of the other
 			 * fragments.
 			 */
 			tx->size = m->m_pkthdr.len;
 
 			/*
 			 * The first fragment contains the checksum flags
 			 * and is optionally followed by extra data for
 			 * TSO etc.
 			 */
 			/**
 			 * CSUM_TSO requires checksum offloading.
 			 * Some versions of FreeBSD fail to
 			 * set CSUM_TCP in the CSUM_TSO case,
 			 * so we have to test for CSUM_TSO
 			 * explicitly.
 			 */
 			if (m->m_pkthdr.csum_flags
 			    & (CSUM_DELAY_DATA | CSUM_TSO)) {
 				tx->flags |= (NETTXF_csum_blank
 				    | NETTXF_data_validated);
 			}
 			if (m->m_pkthdr.csum_flags & CSUM_TSO) {
 				struct netif_extra_info *gso =
 					(struct netif_extra_info *)
 					RING_GET_REQUEST(&txq->ring,
 							 ++txq->ring.req_prod_pvt);
 
 				tx->flags |= NETTXF_extra_info;
 
 				gso->u.gso.size = m->m_pkthdr.tso_segsz;
 				gso->u.gso.type =
 					XEN_NETIF_GSO_TYPE_TCPV4;
 				gso->u.gso.pad = 0;
 				gso->u.gso.features = 0;
 
 				gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
 				gso->flags = 0;
 			}
 		} else {
 			tx->size = m->m_len;
 		}
 		if (m->m_next)
 			tx->flags |= NETTXF_more_data;
 
 		txq->ring.req_prod_pvt++;
 	}
 	BPF_MTAP(ifp, m_head);
 
 	xn_txeof(txq);
 
 	txq->stats.tx_bytes += m_head->m_pkthdr.len;
 	txq->stats.tx_packets++;
 
 	return (0);
 }
 
 /* equivalent of network_open() in Linux */
 static void
 xn_ifinit_locked(struct netfront_info *np)
 {
 	struct ifnet *ifp;
 	int i;
 	struct netfront_rxq *rxq;
 
 	XN_LOCK_ASSERT(np);
 
 	ifp = np->xn_ifp;
 
 	if (ifp->if_drv_flags & IFF_DRV_RUNNING || !netfront_carrier_ok(np))
 		return;
 
 	xn_stop(np);
 
 	for (i = 0; i < np->num_queues; i++) {
 		rxq = &np->rxq[i];
 		XN_RX_LOCK(rxq);
 		xn_alloc_rx_buffers(rxq);
 		rxq->ring.sring->rsp_event = rxq->ring.rsp_cons + 1;
 		if (RING_HAS_UNCONSUMED_RESPONSES(&rxq->ring))
 			xn_rxeof(rxq);
 		XN_RX_UNLOCK(rxq);
 	}
 
 	ifp->if_drv_flags |= IFF_DRV_RUNNING;
 	ifp->if_drv_flags &= ~IFF_DRV_OACTIVE;
 	if_link_state_change(ifp, LINK_STATE_UP);
 }
 
 static void
 xn_ifinit(void *xsc)
 {
 	struct netfront_info *sc = xsc;
 
 	XN_LOCK(sc);
 	xn_ifinit_locked(sc);
 	XN_UNLOCK(sc);
 }
 
 static int
 xn_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
 {
 	struct netfront_info *sc = ifp->if_softc;
 	struct ifreq *ifr = (struct ifreq *) data;
 	device_t dev;
 #ifdef INET
 	struct ifaddr *ifa = (struct ifaddr *)data;
 #endif
-	int mask, error = 0;
+	int mask, error = 0, reinit;
 
 	dev = sc->xbdev;
 
 	switch(cmd) {
 	case SIOCSIFADDR:
 #ifdef INET
 		XN_LOCK(sc);
 		if (ifa->ifa_addr->sa_family == AF_INET) {
 			ifp->if_flags |= IFF_UP;
 			if (!(ifp->if_drv_flags & IFF_DRV_RUNNING))
 				xn_ifinit_locked(sc);
 			arp_ifinit(ifp, ifa);
 			XN_UNLOCK(sc);
 		} else {
 			XN_UNLOCK(sc);
 #endif
 			error = ether_ioctl(ifp, cmd, data);
 #ifdef INET
 		}
 #endif
 		break;
 	case SIOCSIFMTU:
 		ifp->if_mtu = ifr->ifr_mtu;
 		ifp->if_drv_flags &= ~IFF_DRV_RUNNING;
 		xn_ifinit(sc);
 		break;
 	case SIOCSIFFLAGS:
 		XN_LOCK(sc);
 		if (ifp->if_flags & IFF_UP) {
 			/*
 			 * If only the state of the PROMISC flag changed,
 			 * then just use the 'set promisc mode' command
 			 * instead of reinitializing the entire NIC. Doing
 			 * a full re-init means reloading the firmware and
 			 * waiting for it to start up, which may take a
 			 * second or two.
 			 */
 			xn_ifinit_locked(sc);
 		} else {
 			if (ifp->if_drv_flags & IFF_DRV_RUNNING) {
 				xn_stop(sc);
 			}
 		}
 		sc->xn_if_flags = ifp->if_flags;
 		XN_UNLOCK(sc);
 		break;
 	case SIOCSIFCAP:
 		mask = ifr->ifr_reqcap ^ ifp->if_capenable;
+		reinit = 0;
+
 		if (mask & IFCAP_TXCSUM) {
-			if (IFCAP_TXCSUM & ifp->if_capenable) {
-				ifp->if_capenable &= ~(IFCAP_TXCSUM|IFCAP_TSO4);
-				ifp->if_hwassist &= ~(CSUM_TCP | CSUM_UDP
-				    | CSUM_IP | CSUM_TSO);
-			} else {
-				ifp->if_capenable |= IFCAP_TXCSUM;
-				ifp->if_hwassist |= (CSUM_TCP | CSUM_UDP
-				    | CSUM_IP);
-			}
+			ifp->if_capenable ^= IFCAP_TXCSUM;
+			ifp->if_hwassist ^= XN_CSUM_FEATURES;
 		}
-		if (mask & IFCAP_RXCSUM) {
-			ifp->if_capenable ^= IFCAP_RXCSUM;
-		}
 		if (mask & IFCAP_TSO4) {
-			if (IFCAP_TSO4 & ifp->if_capenable) {
-				ifp->if_capenable &= ~IFCAP_TSO4;
-				ifp->if_hwassist &= ~CSUM_TSO;
-			} else if (IFCAP_TXCSUM & ifp->if_capenable) {
-				ifp->if_capenable |= IFCAP_TSO4;
-				ifp->if_hwassist |= CSUM_TSO;
-			} else {
-				IPRINTK("Xen requires tx checksum offload"
-				    " be enabled to use TSO\n");
-				error = EINVAL;
-			}
+			ifp->if_capenable ^= IFCAP_TSO4;
+			ifp->if_hwassist ^= CSUM_TSO;
 		}
-		if (mask & IFCAP_LRO) {
-			ifp->if_capenable ^= IFCAP_LRO;
 
+		if (mask & (IFCAP_RXCSUM | IFCAP_LRO)) {
+			/* These Rx features require us to renegotiate. */
+			reinit = 1;
+
+			if (mask & IFCAP_RXCSUM)
+				ifp->if_capenable ^= IFCAP_RXCSUM;
+			if (mask & IFCAP_LRO)
+				ifp->if_capenable ^= IFCAP_LRO;
 		}
+
+		if (reinit == 0)
+			break;
+
 		/*
 		 * We must reset the interface so the backend picks up the
 		 * new features.
 		 */
+		device_printf(sc->xbdev,
+		    "performing interface reset due to feature change\n");
 		XN_LOCK(sc);
 		netfront_carrier_off(sc);
 		sc->xn_reset = true;
 		/*
 		 * NB: the pending packet queue is not flushed, since
 		 * the interface should still support the old options.
 		 */
 		XN_UNLOCK(sc);
 		/*
 		 * Delete the xenstore nodes that export features.
 		 *
 		 * NB: There's a xenbus state called
 		 * "XenbusStateReconfiguring", which is what we should set
 		 * here. Sadly none of the backends know how to handle it,
 		 * and simply disconnect from the frontend, so we will just
 		 * switch back to XenbusStateInitialising in order to force
 		 * a reconnection.
 		 */
 		xs_rm(XST_NIL, xenbus_get_node(dev), "feature-gso-tcpv4");
 		xs_rm(XST_NIL, xenbus_get_node(dev), "feature-no-csum-offload");
 		xenbus_set_state(dev, XenbusStateClosing);
+
+		/*
+		 * Wait for the frontend to reconnect before returning
+		 * from the ioctl. 30s should be more than enough for any
+		 * sane backend to reconnect.
+		 */
+		error = tsleep(sc, 0, "xn_rst", 30*hz);
 		break;
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		break;
 	case SIOCSIFMEDIA:
 	case SIOCGIFMEDIA:
 		error = ifmedia_ioctl(ifp, ifr, &sc->sc_media, cmd);
 		break;
 	default:
 		error = ether_ioctl(ifp, cmd, data);
 	}
 
 	return (error);
 }
 
 static void
 xn_stop(struct netfront_info *sc)
 {
 	struct ifnet *ifp;
 
 	XN_LOCK_ASSERT(sc);
 
 	ifp = sc->xn_ifp;
 
 	ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
 	if_link_state_change(ifp, LINK_STATE_DOWN);
 }
 
 static void
 xn_rebuild_rx_bufs(struct netfront_rxq *rxq)
 {
 	int requeue_idx, i;
 	grant_ref_t ref;
 	netif_rx_request_t *req;
 
 	for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE; i++) {
 		struct mbuf *m;
 		u_long pfn;
 
 		if (rxq->mbufs[i] == NULL)
 			continue;
 
 		m = rxq->mbufs[requeue_idx] = xn_get_rx_mbuf(rxq, i);
 		ref = rxq->grant_ref[requeue_idx] = xn_get_rx_ref(rxq, i);
 
 		req = RING_GET_REQUEST(&rxq->ring, requeue_idx);
 		pfn = vtophys(mtod(m, vm_offset_t)) >> PAGE_SHIFT;
 
 		gnttab_grant_foreign_access_ref(ref,
 		    xenbus_get_otherend_id(rxq->info->xbdev),
 		    pfn, 0);
 
 		req->gref = ref;
 		req->id   = requeue_idx;
 
 		requeue_idx++;
 	}
 
 	rxq->ring.req_prod_pvt = requeue_idx;
 }
 
 /* START of Xenolinux helper functions adapted to FreeBSD */
 static int
 xn_connect(struct netfront_info *np)
 {
 	int i, error;
 	u_int feature_rx_copy;
 	struct netfront_rxq *rxq;
 	struct netfront_txq *txq;
 
 	error = xs_scanf(XST_NIL, xenbus_get_otherend_path(np->xbdev),
 	    "feature-rx-copy", NULL, "%u", &feature_rx_copy);
 	if (error != 0)
 		feature_rx_copy = 0;
 
 	/* We only support rx copy. */
 	if (!feature_rx_copy)
 		return (EPROTONOSUPPORT);
 
 	/* Recovery procedure: */
 	error = talk_to_backend(np->xbdev, np);
 	if (error != 0)
 		return (error);
 
 	/* Step 1: Reinitialise variables. */
 	xn_query_features(np);
 	xn_configure_features(np);
 
 	/* Step 2: Release TX buffer */
 	for (i = 0; i < np->num_queues; i++) {
 		txq = &np->txq[i];
 		xn_release_tx_bufs(txq);
 	}
 
 	/* Step 3: Rebuild the RX buffer freelist and the RX ring itself. */
 	for (i = 0; i < np->num_queues; i++) {
 		rxq = &np->rxq[i];
 		xn_rebuild_rx_bufs(rxq);
 	}
 
 	/* Step 4: All public and private state should now be sane.  Get
 	 * ready to start sending and receiving packets and give the driver
 	 * domain a kick because we've probably just requeued some
 	 * packets.
 	 */
 	netfront_carrier_on(np);
+	wakeup(np);
 
 	return (0);
 }
 
 static void
 xn_kick_rings(struct netfront_info *np)
 {
 	struct netfront_rxq *rxq;
 	struct netfront_txq *txq;
 	int i;
 
 	for (i = 0; i < np->num_queues; i++) {
 		txq = &np->txq[i];
 		rxq = &np->rxq[i];
 		xen_intr_signal(txq->xen_intr_handle);
 		XN_TX_LOCK(txq);
 		xn_txeof(txq);
 		XN_TX_UNLOCK(txq);
 		XN_RX_LOCK(rxq);
 		xn_alloc_rx_buffers(rxq);
 		XN_RX_UNLOCK(rxq);
 	}
 }
 
 static void
 xn_query_features(struct netfront_info *np)
 {
 	int val;
 
 	device_printf(np->xbdev, "backend features:");
 
 	if (xs_scanf(XST_NIL, xenbus_get_otherend_path(np->xbdev),
 		"feature-sg", NULL, "%d", &val) != 0)
 		val = 0;
 
 	np->maxfrags = 1;
 	if (val) {
 		np->maxfrags = MAX_TX_REQ_FRAGS;
 		printf(" feature-sg");
 	}
 
 	if (xs_scanf(XST_NIL, xenbus_get_otherend_path(np->xbdev),
 		"feature-gso-tcpv4", NULL, "%d", &val) != 0)
 		val = 0;
 
 	np->xn_ifp->if_capabilities &= ~(IFCAP_TSO4|IFCAP_LRO);
 	if (val) {
 		np->xn_ifp->if_capabilities |= IFCAP_TSO4|IFCAP_LRO;
 		printf(" feature-gso-tcp4");
 	}
 
 	/*
 	 * HW CSUM offload is assumed to be available unless
 	 * feature-no-csum-offload is set in xenstore.
 	 */
 	if (xs_scanf(XST_NIL, xenbus_get_otherend_path(np->xbdev),
 		"feature-no-csum-offload", NULL, "%d", &val) != 0)
 		val = 0;
 
 	np->xn_ifp->if_capabilities |= IFCAP_HWCSUM;
 	if (val) {
 		np->xn_ifp->if_capabilities &= ~(IFCAP_HWCSUM);
 		printf(" feature-no-csum-offload");
 	}
 
 	printf("\n");
 }
 
 static int
 xn_configure_features(struct netfront_info *np)
 {
 	int err, cap_enabled;
 #if (defined(INET) || defined(INET6))
 	int i;
 #endif
 	struct ifnet *ifp;
 
 	ifp = np->xn_ifp;
 	err = 0;
 
 	if ((ifp->if_capenable & ifp->if_capabilities) == ifp->if_capenable) {
 		/* Current options are available, no need to do anything. */
 		return (0);
 	}
 
 	/* Try to preserve as many options as possible. */
 	cap_enabled = ifp->if_capenable;
 	ifp->if_capenable = ifp->if_hwassist = 0;
 
 #if (defined(INET) || defined(INET6))
 	if ((cap_enabled & IFCAP_LRO) != 0)
 		for (i = 0; i < np->num_queues; i++)
 			tcp_lro_free(&np->rxq[i].lro);
 	if (xn_enable_lro &&
 	    (ifp->if_capabilities & cap_enabled & IFCAP_LRO) != 0) {
 	    	ifp->if_capenable |= IFCAP_LRO;
 		for (i = 0; i < np->num_queues; i++) {
 			err = tcp_lro_init(&np->rxq[i].lro);
 			if (err != 0) {
 				device_printf(np->xbdev,
 				    "LRO initialization failed\n");
 				ifp->if_capenable &= ~IFCAP_LRO;
 				break;
 			}
 			np->rxq[i].lro.ifp = ifp;
 		}
 	}
 	if ((ifp->if_capabilities & cap_enabled & IFCAP_TSO4) != 0) {
 		ifp->if_capenable |= IFCAP_TSO4;
 		ifp->if_hwassist |= CSUM_TSO;
 	}
 #endif
 	if ((ifp->if_capabilities & cap_enabled & IFCAP_TXCSUM) != 0) {
 		ifp->if_capenable |= IFCAP_TXCSUM;
-		ifp->if_hwassist |= CSUM_TCP|CSUM_UDP;
+		ifp->if_hwassist |= XN_CSUM_FEATURES;
 	}
 	if ((ifp->if_capabilities & cap_enabled & IFCAP_RXCSUM) != 0)
 		ifp->if_capenable |= IFCAP_RXCSUM;
 
 	return (err);
 }
 
 static int
 xn_txq_mq_start_locked(struct netfront_txq *txq, struct mbuf *m)
 {
 	struct netfront_info *np;
 	struct ifnet *ifp;
 	struct buf_ring *br;
 	int error, notify;
 
 	np = txq->info;
 	br = txq->br;
 	ifp = np->xn_ifp;
 	error = 0;
 
 	XN_TX_LOCK_ASSERT(txq);
 
 	if ((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 ||
 	    !netfront_carrier_ok(np)) {
 		if (m != NULL)
 			error = drbr_enqueue(ifp, br, m);
 		return (error);
 	}
 
 	if (m != NULL) {
 		error = drbr_enqueue(ifp, br, m);
 		if (error != 0)
 			return (error);
 	}
 
 	while ((m = drbr_peek(ifp, br)) != NULL) {
 		if (!xn_tx_slot_available(txq)) {
 			drbr_putback(ifp, br, m);
 			break;
 		}
 
 		error = xn_assemble_tx_request(txq, m);
 		/* xn_assemble_tx_request always consumes the mbuf*/
 		if (error != 0) {
 			drbr_advance(ifp, br);
 			break;
 		}
 
 		RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&txq->ring, notify);
 		if (notify)
 			xen_intr_signal(txq->xen_intr_handle);
 
 		drbr_advance(ifp, br);
 	}
 
 	if (RING_FULL(&txq->ring))
 		txq->full = true;
 
 	return (0);
 }
 
 static int
 xn_txq_mq_start(struct ifnet *ifp, struct mbuf *m)
 {
 	struct netfront_info *np;
 	struct netfront_txq *txq;
 	int i, npairs, error;
 
 	np = ifp->if_softc;
 	npairs = np->num_queues;
 
 	if (!netfront_carrier_ok(np))
 		return (ENOBUFS);
 
 	KASSERT(npairs != 0, ("called with 0 available queues"));
 
 	/* check if flowid is set */
 	if (M_HASHTYPE_GET(m) != M_HASHTYPE_NONE)
 		i = m->m_pkthdr.flowid % npairs;
 	else
 		i = curcpu % npairs;
 
 	txq = &np->txq[i];
 
 	if (XN_TX_TRYLOCK(txq) != 0) {
 		error = xn_txq_mq_start_locked(txq, m);
 		XN_TX_UNLOCK(txq);
 	} else {
 		error = drbr_enqueue(ifp, txq->br, m);
 		taskqueue_enqueue(txq->tq, &txq->defrtask);
 	}
 
 	return (error);
 }
 
 static void
 xn_qflush(struct ifnet *ifp)
 {
 	struct netfront_info *np;
 	struct netfront_txq *txq;
 	struct mbuf *m;
 	int i;
 
 	np = ifp->if_softc;
 
 	for (i = 0; i < np->num_queues; i++) {
 		txq = &np->txq[i];
 
 		XN_TX_LOCK(txq);
 		while ((m = buf_ring_dequeue_sc(txq->br)) != NULL)
 			m_freem(m);
 		XN_TX_UNLOCK(txq);
 	}
 
 	if_qflush(ifp);
 }
 
 /**
  * Create a network device.
  * @param dev  Newbus device representing this virtual NIC.
  */
 int
 create_netdev(device_t dev)
 {
 	struct netfront_info *np;
 	int err;
 	struct ifnet *ifp;
 
 	np = device_get_softc(dev);
 
 	np->xbdev         = dev;
 
 	mtx_init(&np->sc_lock, "xnsc", "netfront softc lock", MTX_DEF);
 
 	ifmedia_init(&np->sc_media, 0, xn_ifmedia_upd, xn_ifmedia_sts);
 	ifmedia_add(&np->sc_media, IFM_ETHER|IFM_MANUAL, 0, NULL);
 	ifmedia_set(&np->sc_media, IFM_ETHER|IFM_MANUAL);
 
 	err = xen_net_read_mac(dev, np->mac);
 	if (err != 0)
 		goto error;
 
 	/* Set up ifnet structure */
 	ifp = np->xn_ifp = if_alloc(IFT_ETHER);
     	ifp->if_softc = np;
     	if_initname(ifp, "xn",  device_get_unit(dev));
     	ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST;
     	ifp->if_ioctl = xn_ioctl;
 
 	ifp->if_transmit = xn_txq_mq_start;
 	ifp->if_qflush = xn_qflush;
 
     	ifp->if_init = xn_ifinit;
 
     	ifp->if_hwassist = XN_CSUM_FEATURES;
 	/* Enable all supported features at device creation. */
 	ifp->if_capenable = ifp->if_capabilities =
 	    IFCAP_HWCSUM|IFCAP_TSO4|IFCAP_LRO;
 	ifp->if_hw_tsomax = 65536 - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN);
 	ifp->if_hw_tsomaxsegcount = MAX_TX_REQ_FRAGS;
 	ifp->if_hw_tsomaxsegsize = PAGE_SIZE;
 
     	ether_ifattach(ifp, np->mac);
 	netfront_carrier_off(np);
 
 	return (0);
 
 error:
 	KASSERT(err != 0, ("Error path with no error code specified"));
 	return (err);
 }
 
 static int
 netfront_detach(device_t dev)
 {
 	struct netfront_info *info = device_get_softc(dev);
 
 	DPRINTK("%s\n", xenbus_get_node(dev));
 
 	netif_free(info);
 
 	return 0;
 }
 
 static void
 netif_free(struct netfront_info *np)
 {
 
 	XN_LOCK(np);
 	xn_stop(np);
 	XN_UNLOCK(np);
 	netif_disconnect_backend(np);
 	ether_ifdetach(np->xn_ifp);
 	free(np->rxq, M_DEVBUF);
 	free(np->txq, M_DEVBUF);
 	if_free(np->xn_ifp);
 	np->xn_ifp = NULL;
 	ifmedia_removeall(&np->sc_media);
 }
 
 static void
 netif_disconnect_backend(struct netfront_info *np)
 {
 	u_int i;
 
 	for (i = 0; i < np->num_queues; i++) {
 		XN_RX_LOCK(&np->rxq[i]);
 		XN_TX_LOCK(&np->txq[i]);
 	}
 	netfront_carrier_off(np);
 	for (i = 0; i < np->num_queues; i++) {
 		XN_RX_UNLOCK(&np->rxq[i]);
 		XN_TX_UNLOCK(&np->txq[i]);
 	}
 
 	for (i = 0; i < np->num_queues; i++) {
 		disconnect_rxq(&np->rxq[i]);
 		disconnect_txq(&np->txq[i]);
 	}
 }
 
 static int
 xn_ifmedia_upd(struct ifnet *ifp)
 {
 
 	return (0);
 }
 
 static void
 xn_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr)
 {
 
 	ifmr->ifm_status = IFM_AVALID|IFM_ACTIVE;
 	ifmr->ifm_active = IFM_ETHER|IFM_MANUAL;
 }
 
 /* ** Driver registration ** */
 static device_method_t netfront_methods[] = {
 	/* Device interface */
 	DEVMETHOD(device_probe,         netfront_probe),
 	DEVMETHOD(device_attach,        netfront_attach),
 	DEVMETHOD(device_detach,        netfront_detach),
 	DEVMETHOD(device_shutdown,      bus_generic_shutdown),
 	DEVMETHOD(device_suspend,       netfront_suspend),
 	DEVMETHOD(device_resume,        netfront_resume),
 
 	/* Xenbus interface */
 	DEVMETHOD(xenbus_otherend_changed, netfront_backend_changed),
 
 	DEVMETHOD_END
 };
 
 static driver_t netfront_driver = {
 	"xn",
 	netfront_methods,
 	sizeof(struct netfront_info),
 };
 devclass_t netfront_devclass;
 
 DRIVER_MODULE(xe, xenbusb_front, netfront_driver, netfront_devclass, NULL,
     NULL);
Index: user/alc/PQ_LAUNDRY/sys/kern/init_sysent.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/kern/init_sysent.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/kern/init_sysent.c	(revision 303775)
@@ -1,599 +1,599 @@
 /*
  * System call switch table.
  *
  * DO NOT EDIT-- this file is automatically generated.
  * $FreeBSD$
- * created from FreeBSD: head/sys/kern/syscalls.master 303700 2016-08-03 06:35:58Z ed 
+ * created from FreeBSD: head/sys/kern/syscalls.master 303729 2016-08-03 18:48:56Z bdrewery 
  */
 
 #include "opt_compat.h"
 
 #include <sys/param.h>
 #include <sys/sysent.h>
 #include <sys/sysproto.h>
 
 #define AS(name) (sizeof(struct name) / sizeof(register_t))
 
 #ifdef COMPAT_43
 #define compat(n, name) n, (sy_call_t *)__CONCAT(o,name)
 #else
 #define compat(n, name) 0, (sy_call_t *)nosys
 #endif
 
 #ifdef COMPAT_FREEBSD4
 #define compat4(n, name) n, (sy_call_t *)__CONCAT(freebsd4_,name)
 #else
 #define compat4(n, name) 0, (sy_call_t *)nosys
 #endif
 
 #ifdef COMPAT_FREEBSD6
 #define compat6(n, name) n, (sy_call_t *)__CONCAT(freebsd6_,name)
 #else
 #define compat6(n, name) 0, (sy_call_t *)nosys
 #endif
 
 #ifdef COMPAT_FREEBSD7
 #define compat7(n, name) n, (sy_call_t *)__CONCAT(freebsd7_,name)
 #else
 #define compat7(n, name) 0, (sy_call_t *)nosys
 #endif
 
 #ifdef COMPAT_FREEBSD10
 #define compat10(n, name) n, (sy_call_t *)__CONCAT(freebsd10_,name)
 #else
 #define compat10(n, name) 0, (sy_call_t *)nosys
 #endif
 
 /* The casts are bogus but will do for now. */
 struct sysent sysent[] = {
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },		/* 0 = syscall */
 	{ AS(sys_exit_args), (sy_call_t *)sys_sys_exit, AUE_EXIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 1 = exit */
 	{ 0, (sy_call_t *)sys_fork, AUE_FORK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 2 = fork */
 	{ AS(read_args), (sy_call_t *)sys_read, AUE_READ, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 3 = read */
 	{ AS(write_args), (sy_call_t *)sys_write, AUE_WRITE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 4 = write */
 	{ AS(open_args), (sy_call_t *)sys_open, AUE_OPEN_RWTC, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 5 = open */
 	{ AS(close_args), (sy_call_t *)sys_close, AUE_CLOSE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 6 = close */
 	{ AS(wait4_args), (sy_call_t *)sys_wait4, AUE_WAIT4, NULL, 0, 0, 0, SY_THR_STATIC },	/* 7 = wait4 */
 	{ compat(AS(ocreat_args),creat), AUE_CREAT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 8 = old creat */
 	{ AS(link_args), (sy_call_t *)sys_link, AUE_LINK, NULL, 0, 0, 0, SY_THR_STATIC },	/* 9 = link */
 	{ AS(unlink_args), (sy_call_t *)sys_unlink, AUE_UNLINK, NULL, 0, 0, 0, SY_THR_STATIC },	/* 10 = unlink */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 11 = obsolete execv */
 	{ AS(chdir_args), (sy_call_t *)sys_chdir, AUE_CHDIR, NULL, 0, 0, 0, SY_THR_STATIC },	/* 12 = chdir */
 	{ AS(fchdir_args), (sy_call_t *)sys_fchdir, AUE_FCHDIR, NULL, 0, 0, 0, SY_THR_STATIC },	/* 13 = fchdir */
 	{ AS(mknod_args), (sy_call_t *)sys_mknod, AUE_MKNOD, NULL, 0, 0, 0, SY_THR_STATIC },	/* 14 = mknod */
 	{ AS(chmod_args), (sy_call_t *)sys_chmod, AUE_CHMOD, NULL, 0, 0, 0, SY_THR_STATIC },	/* 15 = chmod */
 	{ AS(chown_args), (sy_call_t *)sys_chown, AUE_CHOWN, NULL, 0, 0, 0, SY_THR_STATIC },	/* 16 = chown */
 	{ AS(obreak_args), (sy_call_t *)sys_obreak, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 17 = break */
 	{ compat4(AS(freebsd4_getfsstat_args),getfsstat), AUE_GETFSSTAT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 18 = freebsd4 getfsstat */
 	{ compat(AS(olseek_args),lseek), AUE_LSEEK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 19 = old lseek */
 	{ 0, (sy_call_t *)sys_getpid, AUE_GETPID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 20 = getpid */
 	{ AS(mount_args), (sy_call_t *)sys_mount, AUE_MOUNT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 21 = mount */
 	{ AS(unmount_args), (sy_call_t *)sys_unmount, AUE_UMOUNT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 22 = unmount */
 	{ AS(setuid_args), (sy_call_t *)sys_setuid, AUE_SETUID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 23 = setuid */
 	{ 0, (sy_call_t *)sys_getuid, AUE_GETUID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 24 = getuid */
 	{ 0, (sy_call_t *)sys_geteuid, AUE_GETEUID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 25 = geteuid */
 	{ AS(ptrace_args), (sy_call_t *)sys_ptrace, AUE_PTRACE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 26 = ptrace */
 	{ AS(recvmsg_args), (sy_call_t *)sys_recvmsg, AUE_RECVMSG, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 27 = recvmsg */
 	{ AS(sendmsg_args), (sy_call_t *)sys_sendmsg, AUE_SENDMSG, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 28 = sendmsg */
 	{ AS(recvfrom_args), (sy_call_t *)sys_recvfrom, AUE_RECVFROM, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 29 = recvfrom */
 	{ AS(accept_args), (sy_call_t *)sys_accept, AUE_ACCEPT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 30 = accept */
 	{ AS(getpeername_args), (sy_call_t *)sys_getpeername, AUE_GETPEERNAME, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 31 = getpeername */
 	{ AS(getsockname_args), (sy_call_t *)sys_getsockname, AUE_GETSOCKNAME, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 32 = getsockname */
 	{ AS(access_args), (sy_call_t *)sys_access, AUE_ACCESS, NULL, 0, 0, 0, SY_THR_STATIC },	/* 33 = access */
 	{ AS(chflags_args), (sy_call_t *)sys_chflags, AUE_CHFLAGS, NULL, 0, 0, 0, SY_THR_STATIC },	/* 34 = chflags */
 	{ AS(fchflags_args), (sy_call_t *)sys_fchflags, AUE_FCHFLAGS, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 35 = fchflags */
 	{ 0, (sy_call_t *)sys_sync, AUE_SYNC, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 36 = sync */
 	{ AS(kill_args), (sy_call_t *)sys_kill, AUE_KILL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 37 = kill */
 	{ compat(AS(ostat_args),stat), AUE_STAT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 38 = old stat */
 	{ 0, (sy_call_t *)sys_getppid, AUE_GETPPID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 39 = getppid */
 	{ compat(AS(olstat_args),lstat), AUE_LSTAT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 40 = old lstat */
 	{ AS(dup_args), (sy_call_t *)sys_dup, AUE_DUP, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 41 = dup */
 	{ compat10(0,pipe), AUE_PIPE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 42 = freebsd10 pipe */
 	{ 0, (sy_call_t *)sys_getegid, AUE_GETEGID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 43 = getegid */
 	{ AS(profil_args), (sy_call_t *)sys_profil, AUE_PROFILE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 44 = profil */
 	{ AS(ktrace_args), (sy_call_t *)sys_ktrace, AUE_KTRACE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 45 = ktrace */
 	{ compat(AS(osigaction_args),sigaction), AUE_SIGACTION, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 46 = old sigaction */
 	{ 0, (sy_call_t *)sys_getgid, AUE_GETGID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 47 = getgid */
 	{ compat(AS(osigprocmask_args),sigprocmask), AUE_SIGPROCMASK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 48 = old sigprocmask */
 	{ AS(getlogin_args), (sy_call_t *)sys_getlogin, AUE_GETLOGIN, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 49 = getlogin */
 	{ AS(setlogin_args), (sy_call_t *)sys_setlogin, AUE_SETLOGIN, NULL, 0, 0, 0, SY_THR_STATIC },	/* 50 = setlogin */
 	{ AS(acct_args), (sy_call_t *)sys_acct, AUE_ACCT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 51 = acct */
 	{ compat(0,sigpending), AUE_SIGPENDING, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 52 = old sigpending */
 	{ AS(sigaltstack_args), (sy_call_t *)sys_sigaltstack, AUE_SIGALTSTACK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 53 = sigaltstack */
 	{ AS(ioctl_args), (sy_call_t *)sys_ioctl, AUE_IOCTL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 54 = ioctl */
 	{ AS(reboot_args), (sy_call_t *)sys_reboot, AUE_REBOOT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 55 = reboot */
 	{ AS(revoke_args), (sy_call_t *)sys_revoke, AUE_REVOKE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 56 = revoke */
 	{ AS(symlink_args), (sy_call_t *)sys_symlink, AUE_SYMLINK, NULL, 0, 0, 0, SY_THR_STATIC },	/* 57 = symlink */
 	{ AS(readlink_args), (sy_call_t *)sys_readlink, AUE_READLINK, NULL, 0, 0, 0, SY_THR_STATIC },	/* 58 = readlink */
 	{ AS(execve_args), (sy_call_t *)sys_execve, AUE_EXECVE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 59 = execve */
 	{ AS(umask_args), (sy_call_t *)sys_umask, AUE_UMASK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 60 = umask */
 	{ AS(chroot_args), (sy_call_t *)sys_chroot, AUE_CHROOT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 61 = chroot */
 	{ compat(AS(ofstat_args),fstat), AUE_FSTAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 62 = old fstat */
 	{ compat(AS(getkerninfo_args),getkerninfo), AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 63 = old getkerninfo */
 	{ compat(0,getpagesize), AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 64 = old getpagesize */
 	{ AS(msync_args), (sy_call_t *)sys_msync, AUE_MSYNC, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 65 = msync */
 	{ 0, (sy_call_t *)sys_vfork, AUE_VFORK, NULL, 0, 0, 0, SY_THR_STATIC },	/* 66 = vfork */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 67 = obsolete vread */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 68 = obsolete vwrite */
 	{ AS(sbrk_args), (sy_call_t *)sys_sbrk, AUE_SBRK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 69 = sbrk */
 	{ AS(sstk_args), (sy_call_t *)sys_sstk, AUE_SSTK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 70 = sstk */
 	{ compat(AS(ommap_args),mmap), AUE_MMAP, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 71 = old mmap */
 	{ AS(ovadvise_args), (sy_call_t *)sys_ovadvise, AUE_O_VADVISE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 72 = vadvise */
 	{ AS(munmap_args), (sy_call_t *)sys_munmap, AUE_MUNMAP, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 73 = munmap */
 	{ AS(mprotect_args), (sy_call_t *)sys_mprotect, AUE_MPROTECT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 74 = mprotect */
 	{ AS(madvise_args), (sy_call_t *)sys_madvise, AUE_MADVISE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 75 = madvise */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 76 = obsolete vhangup */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 77 = obsolete vlimit */
 	{ AS(mincore_args), (sy_call_t *)sys_mincore, AUE_MINCORE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 78 = mincore */
 	{ AS(getgroups_args), (sy_call_t *)sys_getgroups, AUE_GETGROUPS, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 79 = getgroups */
 	{ AS(setgroups_args), (sy_call_t *)sys_setgroups, AUE_SETGROUPS, NULL, 0, 0, 0, SY_THR_STATIC },	/* 80 = setgroups */
 	{ 0, (sy_call_t *)sys_getpgrp, AUE_GETPGRP, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 81 = getpgrp */
 	{ AS(setpgid_args), (sy_call_t *)sys_setpgid, AUE_SETPGRP, NULL, 0, 0, 0, SY_THR_STATIC },	/* 82 = setpgid */
 	{ AS(setitimer_args), (sy_call_t *)sys_setitimer, AUE_SETITIMER, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 83 = setitimer */
 	{ compat(0,wait), AUE_WAIT4, NULL, 0, 0, 0, SY_THR_STATIC },			/* 84 = old wait */
 	{ AS(swapon_args), (sy_call_t *)sys_swapon, AUE_SWAPON, NULL, 0, 0, 0, SY_THR_STATIC },	/* 85 = swapon */
 	{ AS(getitimer_args), (sy_call_t *)sys_getitimer, AUE_GETITIMER, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 86 = getitimer */
 	{ compat(AS(gethostname_args),gethostname), AUE_SYSCTL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 87 = old gethostname */
 	{ compat(AS(sethostname_args),sethostname), AUE_SYSCTL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 88 = old sethostname */
 	{ 0, (sy_call_t *)sys_getdtablesize, AUE_GETDTABLESIZE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 89 = getdtablesize */
 	{ AS(dup2_args), (sy_call_t *)sys_dup2, AUE_DUP2, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 90 = dup2 */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 91 = getdopt */
 	{ AS(fcntl_args), (sy_call_t *)sys_fcntl, AUE_FCNTL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 92 = fcntl */
 	{ AS(select_args), (sy_call_t *)sys_select, AUE_SELECT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 93 = select */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 94 = setdopt */
 	{ AS(fsync_args), (sy_call_t *)sys_fsync, AUE_FSYNC, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 95 = fsync */
 	{ AS(setpriority_args), (sy_call_t *)sys_setpriority, AUE_SETPRIORITY, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 96 = setpriority */
 	{ AS(socket_args), (sy_call_t *)sys_socket, AUE_SOCKET, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 97 = socket */
 	{ AS(connect_args), (sy_call_t *)sys_connect, AUE_CONNECT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 98 = connect */
 	{ compat(AS(accept_args),accept), AUE_ACCEPT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 99 = old accept */
 	{ AS(getpriority_args), (sy_call_t *)sys_getpriority, AUE_GETPRIORITY, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 100 = getpriority */
 	{ compat(AS(osend_args),send), AUE_SEND, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 101 = old send */
 	{ compat(AS(orecv_args),recv), AUE_RECV, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 102 = old recv */
 	{ compat(AS(osigreturn_args),sigreturn), AUE_SIGRETURN, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 103 = old sigreturn */
 	{ AS(bind_args), (sy_call_t *)sys_bind, AUE_BIND, NULL, 0, 0, 0, SY_THR_STATIC },	/* 104 = bind */
 	{ AS(setsockopt_args), (sy_call_t *)sys_setsockopt, AUE_SETSOCKOPT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 105 = setsockopt */
 	{ AS(listen_args), (sy_call_t *)sys_listen, AUE_LISTEN, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 106 = listen */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 107 = obsolete vtimes */
 	{ compat(AS(osigvec_args),sigvec), AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 108 = old sigvec */
 	{ compat(AS(osigblock_args),sigblock), AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 109 = old sigblock */
 	{ compat(AS(osigsetmask_args),sigsetmask), AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 110 = old sigsetmask */
 	{ compat(AS(osigsuspend_args),sigsuspend), AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 111 = old sigsuspend */
 	{ compat(AS(osigstack_args),sigstack), AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 112 = old sigstack */
 	{ compat(AS(orecvmsg_args),recvmsg), AUE_RECVMSG, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 113 = old recvmsg */
 	{ compat(AS(osendmsg_args),sendmsg), AUE_SENDMSG, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 114 = old sendmsg */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 115 = obsolete vtrace */
 	{ AS(gettimeofday_args), (sy_call_t *)sys_gettimeofday, AUE_GETTIMEOFDAY, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 116 = gettimeofday */
 	{ AS(getrusage_args), (sy_call_t *)sys_getrusage, AUE_GETRUSAGE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 117 = getrusage */
 	{ AS(getsockopt_args), (sy_call_t *)sys_getsockopt, AUE_GETSOCKOPT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 118 = getsockopt */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 119 = resuba */
 	{ AS(readv_args), (sy_call_t *)sys_readv, AUE_READV, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 120 = readv */
 	{ AS(writev_args), (sy_call_t *)sys_writev, AUE_WRITEV, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 121 = writev */
 	{ AS(settimeofday_args), (sy_call_t *)sys_settimeofday, AUE_SETTIMEOFDAY, NULL, 0, 0, 0, SY_THR_STATIC },	/* 122 = settimeofday */
 	{ AS(fchown_args), (sy_call_t *)sys_fchown, AUE_FCHOWN, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 123 = fchown */
 	{ AS(fchmod_args), (sy_call_t *)sys_fchmod, AUE_FCHMOD, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 124 = fchmod */
 	{ compat(AS(recvfrom_args),recvfrom), AUE_RECVFROM, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 125 = old recvfrom */
 	{ AS(setreuid_args), (sy_call_t *)sys_setreuid, AUE_SETREUID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 126 = setreuid */
 	{ AS(setregid_args), (sy_call_t *)sys_setregid, AUE_SETREGID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 127 = setregid */
 	{ AS(rename_args), (sy_call_t *)sys_rename, AUE_RENAME, NULL, 0, 0, 0, SY_THR_STATIC },	/* 128 = rename */
 	{ compat(AS(otruncate_args),truncate), AUE_TRUNCATE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 129 = old truncate */
 	{ compat(AS(oftruncate_args),ftruncate), AUE_FTRUNCATE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 130 = old ftruncate */
 	{ AS(flock_args), (sy_call_t *)sys_flock, AUE_FLOCK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 131 = flock */
 	{ AS(mkfifo_args), (sy_call_t *)sys_mkfifo, AUE_MKFIFO, NULL, 0, 0, 0, SY_THR_STATIC },	/* 132 = mkfifo */
 	{ AS(sendto_args), (sy_call_t *)sys_sendto, AUE_SENDTO, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 133 = sendto */
 	{ AS(shutdown_args), (sy_call_t *)sys_shutdown, AUE_SHUTDOWN, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 134 = shutdown */
 	{ AS(socketpair_args), (sy_call_t *)sys_socketpair, AUE_SOCKETPAIR, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 135 = socketpair */
 	{ AS(mkdir_args), (sy_call_t *)sys_mkdir, AUE_MKDIR, NULL, 0, 0, 0, SY_THR_STATIC },	/* 136 = mkdir */
 	{ AS(rmdir_args), (sy_call_t *)sys_rmdir, AUE_RMDIR, NULL, 0, 0, 0, SY_THR_STATIC },	/* 137 = rmdir */
 	{ AS(utimes_args), (sy_call_t *)sys_utimes, AUE_UTIMES, NULL, 0, 0, 0, SY_THR_STATIC },	/* 138 = utimes */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 139 = obsolete 4.2 sigreturn */
 	{ AS(adjtime_args), (sy_call_t *)sys_adjtime, AUE_ADJTIME, NULL, 0, 0, 0, SY_THR_STATIC },	/* 140 = adjtime */
 	{ compat(AS(ogetpeername_args),getpeername), AUE_GETPEERNAME, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 141 = old getpeername */
 	{ compat(0,gethostid), AUE_SYSCTL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 142 = old gethostid */
 	{ compat(AS(osethostid_args),sethostid), AUE_SYSCTL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 143 = old sethostid */
 	{ compat(AS(ogetrlimit_args),getrlimit), AUE_GETRLIMIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 144 = old getrlimit */
 	{ compat(AS(osetrlimit_args),setrlimit), AUE_SETRLIMIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 145 = old setrlimit */
 	{ compat(AS(okillpg_args),killpg), AUE_KILLPG, NULL, 0, 0, 0, SY_THR_STATIC },	/* 146 = old killpg */
 	{ 0, (sy_call_t *)sys_setsid, AUE_SETSID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 147 = setsid */
 	{ AS(quotactl_args), (sy_call_t *)sys_quotactl, AUE_QUOTACTL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 148 = quotactl */
 	{ compat(0,quota), AUE_O_QUOTA, NULL, 0, 0, 0, SY_THR_STATIC },		/* 149 = old quota */
 	{ compat(AS(getsockname_args),getsockname), AUE_GETSOCKNAME, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 150 = old getsockname */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 151 = sem_lock */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 152 = sem_wakeup */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 153 = asyncdaemon */
 	{ AS(nlm_syscall_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 154 = nlm_syscall */
 	{ AS(nfssvc_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 155 = nfssvc */
 	{ compat(AS(ogetdirentries_args),getdirentries), AUE_GETDIRENTRIES, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 156 = old getdirentries */
 	{ compat4(AS(freebsd4_statfs_args),statfs), AUE_STATFS, NULL, 0, 0, 0, SY_THR_STATIC },	/* 157 = freebsd4 statfs */
 	{ compat4(AS(freebsd4_fstatfs_args),fstatfs), AUE_FSTATFS, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 158 = freebsd4 fstatfs */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 159 = nosys */
 	{ AS(lgetfh_args), (sy_call_t *)sys_lgetfh, AUE_LGETFH, NULL, 0, 0, 0, SY_THR_STATIC },	/* 160 = lgetfh */
 	{ AS(getfh_args), (sy_call_t *)sys_getfh, AUE_NFS_GETFH, NULL, 0, 0, 0, SY_THR_STATIC },	/* 161 = getfh */
 	{ compat4(AS(freebsd4_getdomainname_args),getdomainname), AUE_SYSCTL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 162 = freebsd4 getdomainname */
 	{ compat4(AS(freebsd4_setdomainname_args),setdomainname), AUE_SYSCTL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 163 = freebsd4 setdomainname */
 	{ compat4(AS(freebsd4_uname_args),uname), AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 164 = freebsd4 uname */
 	{ AS(sysarch_args), (sy_call_t *)sysarch, AUE_SYSARCH, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 165 = sysarch */
 	{ AS(rtprio_args), (sy_call_t *)sys_rtprio, AUE_RTPRIO, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 166 = rtprio */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 167 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 168 = nosys */
 	{ AS(semsys_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 169 = semsys */
 	{ AS(msgsys_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 170 = msgsys */
 	{ AS(shmsys_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 171 = shmsys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 172 = nosys */
 	{ compat6(AS(freebsd6_pread_args),pread), AUE_PREAD, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 173 = freebsd6 pread */
 	{ compat6(AS(freebsd6_pwrite_args),pwrite), AUE_PWRITE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 174 = freebsd6 pwrite */
 	{ AS(setfib_args), (sy_call_t *)sys_setfib, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 175 = setfib */
 	{ AS(ntp_adjtime_args), (sy_call_t *)sys_ntp_adjtime, AUE_NTP_ADJTIME, NULL, 0, 0, 0, SY_THR_STATIC },	/* 176 = ntp_adjtime */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 177 = sfork */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 178 = getdescriptor */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 179 = setdescriptor */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 180 = nosys */
 	{ AS(setgid_args), (sy_call_t *)sys_setgid, AUE_SETGID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 181 = setgid */
 	{ AS(setegid_args), (sy_call_t *)sys_setegid, AUE_SETEGID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 182 = setegid */
 	{ AS(seteuid_args), (sy_call_t *)sys_seteuid, AUE_SETEUID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 183 = seteuid */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 184 = lfs_bmapv */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 185 = lfs_markv */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 186 = lfs_segclean */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 187 = lfs_segwait */
 	{ AS(stat_args), (sy_call_t *)sys_stat, AUE_STAT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 188 = stat */
 	{ AS(fstat_args), (sy_call_t *)sys_fstat, AUE_FSTAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 189 = fstat */
 	{ AS(lstat_args), (sy_call_t *)sys_lstat, AUE_LSTAT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 190 = lstat */
 	{ AS(pathconf_args), (sy_call_t *)sys_pathconf, AUE_PATHCONF, NULL, 0, 0, 0, SY_THR_STATIC },	/* 191 = pathconf */
 	{ AS(fpathconf_args), (sy_call_t *)sys_fpathconf, AUE_FPATHCONF, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 192 = fpathconf */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 193 = nosys */
 	{ AS(__getrlimit_args), (sy_call_t *)sys_getrlimit, AUE_GETRLIMIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 194 = getrlimit */
 	{ AS(__setrlimit_args), (sy_call_t *)sys_setrlimit, AUE_SETRLIMIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 195 = setrlimit */
 	{ AS(getdirentries_args), (sy_call_t *)sys_getdirentries, AUE_GETDIRENTRIES, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 196 = getdirentries */
 	{ compat6(AS(freebsd6_mmap_args),mmap), AUE_MMAP, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 197 = freebsd6 mmap */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },		/* 198 = __syscall */
 	{ compat6(AS(freebsd6_lseek_args),lseek), AUE_LSEEK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 199 = freebsd6 lseek */
 	{ compat6(AS(freebsd6_truncate_args),truncate), AUE_TRUNCATE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 200 = freebsd6 truncate */
 	{ compat6(AS(freebsd6_ftruncate_args),ftruncate), AUE_FTRUNCATE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 201 = freebsd6 ftruncate */
 	{ AS(sysctl_args), (sy_call_t *)sys___sysctl, AUE_SYSCTL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 202 = __sysctl */
 	{ AS(mlock_args), (sy_call_t *)sys_mlock, AUE_MLOCK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 203 = mlock */
 	{ AS(munlock_args), (sy_call_t *)sys_munlock, AUE_MUNLOCK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 204 = munlock */
 	{ AS(undelete_args), (sy_call_t *)sys_undelete, AUE_UNDELETE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 205 = undelete */
 	{ AS(futimes_args), (sy_call_t *)sys_futimes, AUE_FUTIMES, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 206 = futimes */
 	{ AS(getpgid_args), (sy_call_t *)sys_getpgid, AUE_GETPGID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 207 = getpgid */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 208 = newreboot */
 	{ AS(poll_args), (sy_call_t *)sys_poll, AUE_POLL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 209 = poll */
 	{ AS(nosys_args), (sy_call_t *)lkmnosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 210 = lkmnosys */
 	{ AS(nosys_args), (sy_call_t *)lkmnosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 211 = lkmnosys */
 	{ AS(nosys_args), (sy_call_t *)lkmnosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 212 = lkmnosys */
 	{ AS(nosys_args), (sy_call_t *)lkmnosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 213 = lkmnosys */
 	{ AS(nosys_args), (sy_call_t *)lkmnosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 214 = lkmnosys */
 	{ AS(nosys_args), (sy_call_t *)lkmnosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 215 = lkmnosys */
 	{ AS(nosys_args), (sy_call_t *)lkmnosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 216 = lkmnosys */
 	{ AS(nosys_args), (sy_call_t *)lkmnosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 217 = lkmnosys */
 	{ AS(nosys_args), (sy_call_t *)lkmnosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 218 = lkmnosys */
 	{ AS(nosys_args), (sy_call_t *)lkmnosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 219 = lkmnosys */
 	{ 0, (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },		/* 220 = freebsd7 __semctl */
 	{ AS(semget_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 221 = semget */
 	{ AS(semop_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 222 = semop */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 223 = semconfig */
 	{ 0, (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },		/* 224 = freebsd7 msgctl */
 	{ AS(msgget_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 225 = msgget */
 	{ AS(msgsnd_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 226 = msgsnd */
 	{ AS(msgrcv_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 227 = msgrcv */
 	{ AS(shmat_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 228 = shmat */
 	{ 0, (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },		/* 229 = freebsd7 shmctl */
 	{ AS(shmdt_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 230 = shmdt */
 	{ AS(shmget_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 231 = shmget */
 	{ AS(clock_gettime_args), (sy_call_t *)sys_clock_gettime, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 232 = clock_gettime */
 	{ AS(clock_settime_args), (sy_call_t *)sys_clock_settime, AUE_CLOCK_SETTIME, NULL, 0, 0, 0, SY_THR_STATIC },	/* 233 = clock_settime */
 	{ AS(clock_getres_args), (sy_call_t *)sys_clock_getres, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 234 = clock_getres */
 	{ AS(ktimer_create_args), (sy_call_t *)sys_ktimer_create, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 235 = ktimer_create */
 	{ AS(ktimer_delete_args), (sy_call_t *)sys_ktimer_delete, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 236 = ktimer_delete */
 	{ AS(ktimer_settime_args), (sy_call_t *)sys_ktimer_settime, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 237 = ktimer_settime */
 	{ AS(ktimer_gettime_args), (sy_call_t *)sys_ktimer_gettime, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 238 = ktimer_gettime */
 	{ AS(ktimer_getoverrun_args), (sy_call_t *)sys_ktimer_getoverrun, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 239 = ktimer_getoverrun */
 	{ AS(nanosleep_args), (sy_call_t *)sys_nanosleep, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 240 = nanosleep */
 	{ AS(ffclock_getcounter_args), (sy_call_t *)sys_ffclock_getcounter, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 241 = ffclock_getcounter */
 	{ AS(ffclock_setestimate_args), (sy_call_t *)sys_ffclock_setestimate, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 242 = ffclock_setestimate */
 	{ AS(ffclock_getestimate_args), (sy_call_t *)sys_ffclock_getestimate, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 243 = ffclock_getestimate */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 244 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 245 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 246 = nosys */
 	{ AS(clock_getcpuclockid2_args), (sy_call_t *)sys_clock_getcpuclockid2, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 247 = clock_getcpuclockid2 */
 	{ AS(ntp_gettime_args), (sy_call_t *)sys_ntp_gettime, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 248 = ntp_gettime */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 249 = nosys */
 	{ AS(minherit_args), (sy_call_t *)sys_minherit, AUE_MINHERIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 250 = minherit */
 	{ AS(rfork_args), (sy_call_t *)sys_rfork, AUE_RFORK, NULL, 0, 0, 0, SY_THR_STATIC },	/* 251 = rfork */
 	{ AS(openbsd_poll_args), (sy_call_t *)sys_openbsd_poll, AUE_POLL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 252 = openbsd_poll */
 	{ 0, (sy_call_t *)sys_issetugid, AUE_ISSETUGID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 253 = issetugid */
 	{ AS(lchown_args), (sy_call_t *)sys_lchown, AUE_LCHOWN, NULL, 0, 0, 0, SY_THR_STATIC },	/* 254 = lchown */
 	{ AS(aio_read_args), (sy_call_t *)sys_aio_read, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 255 = aio_read */
 	{ AS(aio_write_args), (sy_call_t *)sys_aio_write, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 256 = aio_write */
 	{ AS(lio_listio_args), (sy_call_t *)sys_lio_listio, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 257 = lio_listio */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 258 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 259 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 260 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 261 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 262 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 263 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 264 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 265 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 266 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 267 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 268 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 269 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 270 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 271 = nosys */
 	{ AS(getdents_args), (sy_call_t *)sys_getdents, AUE_O_GETDENTS, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 272 = getdents */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 273 = nosys */
 	{ AS(lchmod_args), (sy_call_t *)sys_lchmod, AUE_LCHMOD, NULL, 0, 0, 0, SY_THR_STATIC },	/* 274 = lchmod */
 	{ AS(lchown_args), (sy_call_t *)sys_lchown, AUE_LCHOWN, NULL, 0, 0, 0, SY_THR_STATIC },	/* 275 = netbsd_lchown */
 	{ AS(lutimes_args), (sy_call_t *)sys_lutimes, AUE_LUTIMES, NULL, 0, 0, 0, SY_THR_STATIC },	/* 276 = lutimes */
 	{ AS(msync_args), (sy_call_t *)sys_msync, AUE_MSYNC, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 277 = netbsd_msync */
 	{ AS(nstat_args), (sy_call_t *)sys_nstat, AUE_STAT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 278 = nstat */
 	{ AS(nfstat_args), (sy_call_t *)sys_nfstat, AUE_FSTAT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 279 = nfstat */
 	{ AS(nlstat_args), (sy_call_t *)sys_nlstat, AUE_LSTAT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 280 = nlstat */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 281 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 282 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 283 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 284 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 285 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 286 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 287 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 288 = nosys */
 	{ AS(preadv_args), (sy_call_t *)sys_preadv, AUE_PREADV, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 289 = preadv */
 	{ AS(pwritev_args), (sy_call_t *)sys_pwritev, AUE_PWRITEV, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 290 = pwritev */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 291 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 292 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 293 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 294 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 295 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 296 = nosys */
 	{ compat4(AS(freebsd4_fhstatfs_args),fhstatfs), AUE_FHSTATFS, NULL, 0, 0, 0, SY_THR_STATIC },	/* 297 = freebsd4 fhstatfs */
 	{ AS(fhopen_args), (sy_call_t *)sys_fhopen, AUE_FHOPEN, NULL, 0, 0, 0, SY_THR_STATIC },	/* 298 = fhopen */
 	{ AS(fhstat_args), (sy_call_t *)sys_fhstat, AUE_FHSTAT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 299 = fhstat */
 	{ AS(modnext_args), (sy_call_t *)sys_modnext, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 300 = modnext */
 	{ AS(modstat_args), (sy_call_t *)sys_modstat, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 301 = modstat */
 	{ AS(modfnext_args), (sy_call_t *)sys_modfnext, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 302 = modfnext */
 	{ AS(modfind_args), (sy_call_t *)sys_modfind, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 303 = modfind */
 	{ AS(kldload_args), (sy_call_t *)sys_kldload, AUE_MODLOAD, NULL, 0, 0, 0, SY_THR_STATIC },	/* 304 = kldload */
 	{ AS(kldunload_args), (sy_call_t *)sys_kldunload, AUE_MODUNLOAD, NULL, 0, 0, 0, SY_THR_STATIC },	/* 305 = kldunload */
 	{ AS(kldfind_args), (sy_call_t *)sys_kldfind, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 306 = kldfind */
 	{ AS(kldnext_args), (sy_call_t *)sys_kldnext, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 307 = kldnext */
 	{ AS(kldstat_args), (sy_call_t *)sys_kldstat, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 308 = kldstat */
 	{ AS(kldfirstmod_args), (sy_call_t *)sys_kldfirstmod, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 309 = kldfirstmod */
 	{ AS(getsid_args), (sy_call_t *)sys_getsid, AUE_GETSID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 310 = getsid */
 	{ AS(setresuid_args), (sy_call_t *)sys_setresuid, AUE_SETRESUID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 311 = setresuid */
 	{ AS(setresgid_args), (sy_call_t *)sys_setresgid, AUE_SETRESGID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 312 = setresgid */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 313 = obsolete signanosleep */
 	{ AS(aio_return_args), (sy_call_t *)sys_aio_return, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 314 = aio_return */
 	{ AS(aio_suspend_args), (sy_call_t *)sys_aio_suspend, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 315 = aio_suspend */
 	{ AS(aio_cancel_args), (sy_call_t *)sys_aio_cancel, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 316 = aio_cancel */
 	{ AS(aio_error_args), (sy_call_t *)sys_aio_error, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 317 = aio_error */
 	{ compat6(AS(freebsd6_aio_read_args),aio_read), AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 318 = freebsd6 aio_read */
 	{ compat6(AS(freebsd6_aio_write_args),aio_write), AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 319 = freebsd6 aio_write */
 	{ compat6(AS(freebsd6_lio_listio_args),lio_listio), AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 320 = freebsd6 lio_listio */
 	{ 0, (sy_call_t *)sys_yield, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 321 = yield */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 322 = obsolete thr_sleep */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 323 = obsolete thr_wakeup */
 	{ AS(mlockall_args), (sy_call_t *)sys_mlockall, AUE_MLOCKALL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 324 = mlockall */
 	{ 0, (sy_call_t *)sys_munlockall, AUE_MUNLOCKALL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 325 = munlockall */
 	{ AS(__getcwd_args), (sy_call_t *)sys___getcwd, AUE_GETCWD, NULL, 0, 0, 0, SY_THR_STATIC },	/* 326 = __getcwd */
 	{ AS(sched_setparam_args), (sy_call_t *)sys_sched_setparam, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 327 = sched_setparam */
 	{ AS(sched_getparam_args), (sy_call_t *)sys_sched_getparam, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 328 = sched_getparam */
 	{ AS(sched_setscheduler_args), (sy_call_t *)sys_sched_setscheduler, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 329 = sched_setscheduler */
 	{ AS(sched_getscheduler_args), (sy_call_t *)sys_sched_getscheduler, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 330 = sched_getscheduler */
 	{ 0, (sy_call_t *)sys_sched_yield, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 331 = sched_yield */
 	{ AS(sched_get_priority_max_args), (sy_call_t *)sys_sched_get_priority_max, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 332 = sched_get_priority_max */
 	{ AS(sched_get_priority_min_args), (sy_call_t *)sys_sched_get_priority_min, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 333 = sched_get_priority_min */
 	{ AS(sched_rr_get_interval_args), (sy_call_t *)sys_sched_rr_get_interval, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 334 = sched_rr_get_interval */
 	{ AS(utrace_args), (sy_call_t *)sys_utrace, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 335 = utrace */
 	{ compat4(AS(freebsd4_sendfile_args),sendfile), AUE_SENDFILE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 336 = freebsd4 sendfile */
 	{ AS(kldsym_args), (sy_call_t *)sys_kldsym, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 337 = kldsym */
 	{ AS(jail_args), (sy_call_t *)sys_jail, AUE_JAIL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 338 = jail */
 	{ AS(nnpfs_syscall_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 339 = nnpfs_syscall */
 	{ AS(sigprocmask_args), (sy_call_t *)sys_sigprocmask, AUE_SIGPROCMASK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 340 = sigprocmask */
 	{ AS(sigsuspend_args), (sy_call_t *)sys_sigsuspend, AUE_SIGSUSPEND, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 341 = sigsuspend */
 	{ compat4(AS(freebsd4_sigaction_args),sigaction), AUE_SIGACTION, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 342 = freebsd4 sigaction */
 	{ AS(sigpending_args), (sy_call_t *)sys_sigpending, AUE_SIGPENDING, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 343 = sigpending */
 	{ compat4(AS(freebsd4_sigreturn_args),sigreturn), AUE_SIGRETURN, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 344 = freebsd4 sigreturn */
 	{ AS(sigtimedwait_args), (sy_call_t *)sys_sigtimedwait, AUE_SIGWAIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 345 = sigtimedwait */
 	{ AS(sigwaitinfo_args), (sy_call_t *)sys_sigwaitinfo, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 346 = sigwaitinfo */
 	{ AS(__acl_get_file_args), (sy_call_t *)sys___acl_get_file, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 347 = __acl_get_file */
 	{ AS(__acl_set_file_args), (sy_call_t *)sys___acl_set_file, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 348 = __acl_set_file */
 	{ AS(__acl_get_fd_args), (sy_call_t *)sys___acl_get_fd, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 349 = __acl_get_fd */
 	{ AS(__acl_set_fd_args), (sy_call_t *)sys___acl_set_fd, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 350 = __acl_set_fd */
 	{ AS(__acl_delete_file_args), (sy_call_t *)sys___acl_delete_file, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 351 = __acl_delete_file */
 	{ AS(__acl_delete_fd_args), (sy_call_t *)sys___acl_delete_fd, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 352 = __acl_delete_fd */
 	{ AS(__acl_aclcheck_file_args), (sy_call_t *)sys___acl_aclcheck_file, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 353 = __acl_aclcheck_file */
 	{ AS(__acl_aclcheck_fd_args), (sy_call_t *)sys___acl_aclcheck_fd, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 354 = __acl_aclcheck_fd */
 	{ AS(extattrctl_args), (sy_call_t *)sys_extattrctl, AUE_EXTATTRCTL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 355 = extattrctl */
 	{ AS(extattr_set_file_args), (sy_call_t *)sys_extattr_set_file, AUE_EXTATTR_SET_FILE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 356 = extattr_set_file */
 	{ AS(extattr_get_file_args), (sy_call_t *)sys_extattr_get_file, AUE_EXTATTR_GET_FILE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 357 = extattr_get_file */
 	{ AS(extattr_delete_file_args), (sy_call_t *)sys_extattr_delete_file, AUE_EXTATTR_DELETE_FILE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 358 = extattr_delete_file */
 	{ AS(aio_waitcomplete_args), (sy_call_t *)sys_aio_waitcomplete, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 359 = aio_waitcomplete */
 	{ AS(getresuid_args), (sy_call_t *)sys_getresuid, AUE_GETRESUID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 360 = getresuid */
 	{ AS(getresgid_args), (sy_call_t *)sys_getresgid, AUE_GETRESGID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 361 = getresgid */
 	{ 0, (sy_call_t *)sys_kqueue, AUE_KQUEUE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 362 = kqueue */
 	{ AS(kevent_args), (sy_call_t *)sys_kevent, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 363 = kevent */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 364 = __cap_get_proc */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 365 = __cap_set_proc */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 366 = __cap_get_fd */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 367 = __cap_get_file */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 368 = __cap_set_fd */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 369 = __cap_set_file */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 370 = nosys */
 	{ AS(extattr_set_fd_args), (sy_call_t *)sys_extattr_set_fd, AUE_EXTATTR_SET_FD, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 371 = extattr_set_fd */
 	{ AS(extattr_get_fd_args), (sy_call_t *)sys_extattr_get_fd, AUE_EXTATTR_GET_FD, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 372 = extattr_get_fd */
 	{ AS(extattr_delete_fd_args), (sy_call_t *)sys_extattr_delete_fd, AUE_EXTATTR_DELETE_FD, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 373 = extattr_delete_fd */
 	{ AS(__setugid_args), (sy_call_t *)sys___setugid, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 374 = __setugid */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 375 = nfsclnt */
 	{ AS(eaccess_args), (sy_call_t *)sys_eaccess, AUE_EACCESS, NULL, 0, 0, 0, SY_THR_STATIC },	/* 376 = eaccess */
 	{ AS(afs3_syscall_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 377 = afs3_syscall */
 	{ AS(nmount_args), (sy_call_t *)sys_nmount, AUE_NMOUNT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 378 = nmount */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 379 = kse_exit */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 380 = kse_wakeup */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 381 = kse_create */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 382 = kse_thr_interrupt */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 383 = kse_release */
 	{ AS(__mac_get_proc_args), (sy_call_t *)sys___mac_get_proc, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 384 = __mac_get_proc */
 	{ AS(__mac_set_proc_args), (sy_call_t *)sys___mac_set_proc, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 385 = __mac_set_proc */
 	{ AS(__mac_get_fd_args), (sy_call_t *)sys___mac_get_fd, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 386 = __mac_get_fd */
 	{ AS(__mac_get_file_args), (sy_call_t *)sys___mac_get_file, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 387 = __mac_get_file */
 	{ AS(__mac_set_fd_args), (sy_call_t *)sys___mac_set_fd, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 388 = __mac_set_fd */
 	{ AS(__mac_set_file_args), (sy_call_t *)sys___mac_set_file, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 389 = __mac_set_file */
 	{ AS(kenv_args), (sy_call_t *)sys_kenv, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 390 = kenv */
 	{ AS(lchflags_args), (sy_call_t *)sys_lchflags, AUE_LCHFLAGS, NULL, 0, 0, 0, SY_THR_STATIC },	/* 391 = lchflags */
 	{ AS(uuidgen_args), (sy_call_t *)sys_uuidgen, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 392 = uuidgen */
 	{ AS(sendfile_args), (sy_call_t *)sys_sendfile, AUE_SENDFILE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 393 = sendfile */
 	{ AS(mac_syscall_args), (sy_call_t *)sys_mac_syscall, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 394 = mac_syscall */
 	{ AS(getfsstat_args), (sy_call_t *)sys_getfsstat, AUE_GETFSSTAT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 395 = getfsstat */
 	{ AS(statfs_args), (sy_call_t *)sys_statfs, AUE_STATFS, NULL, 0, 0, 0, SY_THR_STATIC },	/* 396 = statfs */
 	{ AS(fstatfs_args), (sy_call_t *)sys_fstatfs, AUE_FSTATFS, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 397 = fstatfs */
 	{ AS(fhstatfs_args), (sy_call_t *)sys_fhstatfs, AUE_FHSTATFS, NULL, 0, 0, 0, SY_THR_STATIC },	/* 398 = fhstatfs */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 399 = nosys */
 	{ AS(ksem_close_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 400 = ksem_close */
 	{ AS(ksem_post_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 401 = ksem_post */
 	{ AS(ksem_wait_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 402 = ksem_wait */
 	{ AS(ksem_trywait_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 403 = ksem_trywait */
 	{ AS(ksem_init_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 404 = ksem_init */
 	{ AS(ksem_open_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 405 = ksem_open */
 	{ AS(ksem_unlink_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 406 = ksem_unlink */
 	{ AS(ksem_getvalue_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 407 = ksem_getvalue */
 	{ AS(ksem_destroy_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 408 = ksem_destroy */
 	{ AS(__mac_get_pid_args), (sy_call_t *)sys___mac_get_pid, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 409 = __mac_get_pid */
 	{ AS(__mac_get_link_args), (sy_call_t *)sys___mac_get_link, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 410 = __mac_get_link */
 	{ AS(__mac_set_link_args), (sy_call_t *)sys___mac_set_link, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 411 = __mac_set_link */
 	{ AS(extattr_set_link_args), (sy_call_t *)sys_extattr_set_link, AUE_EXTATTR_SET_LINK, NULL, 0, 0, 0, SY_THR_STATIC },	/* 412 = extattr_set_link */
 	{ AS(extattr_get_link_args), (sy_call_t *)sys_extattr_get_link, AUE_EXTATTR_GET_LINK, NULL, 0, 0, 0, SY_THR_STATIC },	/* 413 = extattr_get_link */
 	{ AS(extattr_delete_link_args), (sy_call_t *)sys_extattr_delete_link, AUE_EXTATTR_DELETE_LINK, NULL, 0, 0, 0, SY_THR_STATIC },	/* 414 = extattr_delete_link */
 	{ AS(__mac_execve_args), (sy_call_t *)sys___mac_execve, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 415 = __mac_execve */
 	{ AS(sigaction_args), (sy_call_t *)sys_sigaction, AUE_SIGACTION, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 416 = sigaction */
 	{ AS(sigreturn_args), (sy_call_t *)sys_sigreturn, AUE_SIGRETURN, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 417 = sigreturn */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 418 = __xstat */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 419 = __xfstat */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 420 = __xlstat */
 	{ AS(getcontext_args), (sy_call_t *)sys_getcontext, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 421 = getcontext */
 	{ AS(setcontext_args), (sy_call_t *)sys_setcontext, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 422 = setcontext */
 	{ AS(swapcontext_args), (sy_call_t *)sys_swapcontext, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 423 = swapcontext */
 	{ AS(swapoff_args), (sy_call_t *)sys_swapoff, AUE_SWAPOFF, NULL, 0, 0, 0, SY_THR_STATIC },	/* 424 = swapoff */
 	{ AS(__acl_get_link_args), (sy_call_t *)sys___acl_get_link, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 425 = __acl_get_link */
 	{ AS(__acl_set_link_args), (sy_call_t *)sys___acl_set_link, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 426 = __acl_set_link */
 	{ AS(__acl_delete_link_args), (sy_call_t *)sys___acl_delete_link, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 427 = __acl_delete_link */
 	{ AS(__acl_aclcheck_link_args), (sy_call_t *)sys___acl_aclcheck_link, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 428 = __acl_aclcheck_link */
 	{ AS(sigwait_args), (sy_call_t *)sys_sigwait, AUE_SIGWAIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 429 = sigwait */
 	{ AS(thr_create_args), (sy_call_t *)sys_thr_create, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 430 = thr_create */
 	{ AS(thr_exit_args), (sy_call_t *)sys_thr_exit, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 431 = thr_exit */
 	{ AS(thr_self_args), (sy_call_t *)sys_thr_self, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 432 = thr_self */
 	{ AS(thr_kill_args), (sy_call_t *)sys_thr_kill, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 433 = thr_kill */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 434 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 435 = nosys */
 	{ AS(jail_attach_args), (sy_call_t *)sys_jail_attach, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 436 = jail_attach */
 	{ AS(extattr_list_fd_args), (sy_call_t *)sys_extattr_list_fd, AUE_EXTATTR_LIST_FD, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 437 = extattr_list_fd */
 	{ AS(extattr_list_file_args), (sy_call_t *)sys_extattr_list_file, AUE_EXTATTR_LIST_FILE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 438 = extattr_list_file */
 	{ AS(extattr_list_link_args), (sy_call_t *)sys_extattr_list_link, AUE_EXTATTR_LIST_LINK, NULL, 0, 0, 0, SY_THR_STATIC },	/* 439 = extattr_list_link */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 440 = kse_switchin */
 	{ AS(ksem_timedwait_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 441 = ksem_timedwait */
 	{ AS(thr_suspend_args), (sy_call_t *)sys_thr_suspend, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 442 = thr_suspend */
 	{ AS(thr_wake_args), (sy_call_t *)sys_thr_wake, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 443 = thr_wake */
 	{ AS(kldunloadf_args), (sy_call_t *)sys_kldunloadf, AUE_MODUNLOAD, NULL, 0, 0, 0, SY_THR_STATIC },	/* 444 = kldunloadf */
 	{ AS(audit_args), (sy_call_t *)sys_audit, AUE_AUDIT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 445 = audit */
 	{ AS(auditon_args), (sy_call_t *)sys_auditon, AUE_AUDITON, NULL, 0, 0, 0, SY_THR_STATIC },	/* 446 = auditon */
 	{ AS(getauid_args), (sy_call_t *)sys_getauid, AUE_GETAUID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 447 = getauid */
 	{ AS(setauid_args), (sy_call_t *)sys_setauid, AUE_SETAUID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 448 = setauid */
 	{ AS(getaudit_args), (sy_call_t *)sys_getaudit, AUE_GETAUDIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 449 = getaudit */
 	{ AS(setaudit_args), (sy_call_t *)sys_setaudit, AUE_SETAUDIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 450 = setaudit */
 	{ AS(getaudit_addr_args), (sy_call_t *)sys_getaudit_addr, AUE_GETAUDIT_ADDR, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 451 = getaudit_addr */
 	{ AS(setaudit_addr_args), (sy_call_t *)sys_setaudit_addr, AUE_SETAUDIT_ADDR, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 452 = setaudit_addr */
 	{ AS(auditctl_args), (sy_call_t *)sys_auditctl, AUE_AUDITCTL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 453 = auditctl */
 	{ AS(_umtx_op_args), (sy_call_t *)sys__umtx_op, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 454 = _umtx_op */
 	{ AS(thr_new_args), (sy_call_t *)sys_thr_new, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 455 = thr_new */
 	{ AS(sigqueue_args), (sy_call_t *)sys_sigqueue, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 456 = sigqueue */
 	{ AS(kmq_open_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 457 = kmq_open */
 	{ AS(kmq_setattr_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_ABSENT },	/* 458 = kmq_setattr */
 	{ AS(kmq_timedreceive_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_ABSENT },	/* 459 = kmq_timedreceive */
 	{ AS(kmq_timedsend_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_ABSENT },	/* 460 = kmq_timedsend */
 	{ AS(kmq_notify_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_ABSENT },	/* 461 = kmq_notify */
 	{ AS(kmq_unlink_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 462 = kmq_unlink */
 	{ AS(abort2_args), (sy_call_t *)sys_abort2, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 463 = abort2 */
 	{ AS(thr_set_name_args), (sy_call_t *)sys_thr_set_name, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 464 = thr_set_name */
 	{ AS(aio_fsync_args), (sy_call_t *)sys_aio_fsync, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 465 = aio_fsync */
 	{ AS(rtprio_thread_args), (sy_call_t *)sys_rtprio_thread, AUE_RTPRIO, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 466 = rtprio_thread */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 467 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 468 = nosys */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 469 = __getpath_fromfd */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 470 = __getpath_fromaddr */
 	{ AS(sctp_peeloff_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_ABSENT },	/* 471 = sctp_peeloff */
 	{ AS(sctp_generic_sendmsg_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_ABSENT },	/* 472 = sctp_generic_sendmsg */
 	{ AS(sctp_generic_sendmsg_iov_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_ABSENT },	/* 473 = sctp_generic_sendmsg_iov */
 	{ AS(sctp_generic_recvmsg_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_ABSENT },	/* 474 = sctp_generic_recvmsg */
 	{ AS(pread_args), (sy_call_t *)sys_pread, AUE_PREAD, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 475 = pread */
 	{ AS(pwrite_args), (sy_call_t *)sys_pwrite, AUE_PWRITE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 476 = pwrite */
 	{ AS(mmap_args), (sy_call_t *)sys_mmap, AUE_MMAP, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 477 = mmap */
 	{ AS(lseek_args), (sy_call_t *)sys_lseek, AUE_LSEEK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 478 = lseek */
 	{ AS(truncate_args), (sy_call_t *)sys_truncate, AUE_TRUNCATE, NULL, 0, 0, 0, SY_THR_STATIC },	/* 479 = truncate */
 	{ AS(ftruncate_args), (sy_call_t *)sys_ftruncate, AUE_FTRUNCATE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 480 = ftruncate */
 	{ AS(thr_kill2_args), (sy_call_t *)sys_thr_kill2, AUE_KILL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 481 = thr_kill2 */
 	{ AS(shm_open_args), (sy_call_t *)sys_shm_open, AUE_SHMOPEN, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 482 = shm_open */
 	{ AS(shm_unlink_args), (sy_call_t *)sys_shm_unlink, AUE_SHMUNLINK, NULL, 0, 0, 0, SY_THR_STATIC },	/* 483 = shm_unlink */
 	{ AS(cpuset_args), (sy_call_t *)sys_cpuset, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 484 = cpuset */
 	{ AS(cpuset_setid_args), (sy_call_t *)sys_cpuset_setid, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 485 = cpuset_setid */
 	{ AS(cpuset_getid_args), (sy_call_t *)sys_cpuset_getid, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 486 = cpuset_getid */
 	{ AS(cpuset_getaffinity_args), (sy_call_t *)sys_cpuset_getaffinity, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 487 = cpuset_getaffinity */
 	{ AS(cpuset_setaffinity_args), (sy_call_t *)sys_cpuset_setaffinity, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 488 = cpuset_setaffinity */
 	{ AS(faccessat_args), (sy_call_t *)sys_faccessat, AUE_FACCESSAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 489 = faccessat */
 	{ AS(fchmodat_args), (sy_call_t *)sys_fchmodat, AUE_FCHMODAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 490 = fchmodat */
 	{ AS(fchownat_args), (sy_call_t *)sys_fchownat, AUE_FCHOWNAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 491 = fchownat */
 	{ AS(fexecve_args), (sy_call_t *)sys_fexecve, AUE_FEXECVE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 492 = fexecve */
 	{ AS(fstatat_args), (sy_call_t *)sys_fstatat, AUE_FSTATAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 493 = fstatat */
 	{ AS(futimesat_args), (sy_call_t *)sys_futimesat, AUE_FUTIMESAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 494 = futimesat */
 	{ AS(linkat_args), (sy_call_t *)sys_linkat, AUE_LINKAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 495 = linkat */
 	{ AS(mkdirat_args), (sy_call_t *)sys_mkdirat, AUE_MKDIRAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 496 = mkdirat */
 	{ AS(mkfifoat_args), (sy_call_t *)sys_mkfifoat, AUE_MKFIFOAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 497 = mkfifoat */
 	{ AS(mknodat_args), (sy_call_t *)sys_mknodat, AUE_MKNODAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 498 = mknodat */
 	{ AS(openat_args), (sy_call_t *)sys_openat, AUE_OPENAT_RWTC, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 499 = openat */
 	{ AS(readlinkat_args), (sy_call_t *)sys_readlinkat, AUE_READLINKAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 500 = readlinkat */
 	{ AS(renameat_args), (sy_call_t *)sys_renameat, AUE_RENAMEAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 501 = renameat */
 	{ AS(symlinkat_args), (sy_call_t *)sys_symlinkat, AUE_SYMLINKAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 502 = symlinkat */
 	{ AS(unlinkat_args), (sy_call_t *)sys_unlinkat, AUE_UNLINKAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 503 = unlinkat */
 	{ AS(posix_openpt_args), (sy_call_t *)sys_posix_openpt, AUE_POSIX_OPENPT, NULL, 0, 0, 0, SY_THR_STATIC },	/* 504 = posix_openpt */
 	{ AS(gssd_syscall_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 505 = gssd_syscall */
 	{ AS(jail_get_args), (sy_call_t *)sys_jail_get, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 506 = jail_get */
 	{ AS(jail_set_args), (sy_call_t *)sys_jail_set, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 507 = jail_set */
 	{ AS(jail_remove_args), (sy_call_t *)sys_jail_remove, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 508 = jail_remove */
 	{ AS(closefrom_args), (sy_call_t *)sys_closefrom, AUE_CLOSEFROM, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 509 = closefrom */
 	{ AS(__semctl_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 510 = __semctl */
 	{ AS(msgctl_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 511 = msgctl */
 	{ AS(shmctl_args), (sy_call_t *)lkmressys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },	/* 512 = shmctl */
 	{ AS(lpathconf_args), (sy_call_t *)sys_lpathconf, AUE_LPATHCONF, NULL, 0, 0, 0, SY_THR_STATIC },	/* 513 = lpathconf */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 514 = obsolete cap_new */
 	{ AS(__cap_rights_get_args), (sy_call_t *)sys___cap_rights_get, AUE_CAP_RIGHTS_GET, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 515 = __cap_rights_get */
 	{ 0, (sy_call_t *)sys_cap_enter, AUE_CAP_ENTER, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 516 = cap_enter */
 	{ AS(cap_getmode_args), (sy_call_t *)sys_cap_getmode, AUE_CAP_GETMODE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 517 = cap_getmode */
 	{ AS(pdfork_args), (sy_call_t *)sys_pdfork, AUE_PDFORK, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 518 = pdfork */
 	{ AS(pdkill_args), (sy_call_t *)sys_pdkill, AUE_PDKILL, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 519 = pdkill */
 	{ AS(pdgetpid_args), (sy_call_t *)sys_pdgetpid, AUE_PDGETPID, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 520 = pdgetpid */
 	{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },			/* 521 = pdwait4 */
 	{ AS(pselect_args), (sy_call_t *)sys_pselect, AUE_SELECT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 522 = pselect */
 	{ AS(getloginclass_args), (sy_call_t *)sys_getloginclass, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 523 = getloginclass */
 	{ AS(setloginclass_args), (sy_call_t *)sys_setloginclass, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 524 = setloginclass */
 	{ AS(rctl_get_racct_args), (sy_call_t *)sys_rctl_get_racct, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 525 = rctl_get_racct */
 	{ AS(rctl_get_rules_args), (sy_call_t *)sys_rctl_get_rules, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 526 = rctl_get_rules */
 	{ AS(rctl_get_limits_args), (sy_call_t *)sys_rctl_get_limits, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 527 = rctl_get_limits */
 	{ AS(rctl_add_rule_args), (sy_call_t *)sys_rctl_add_rule, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 528 = rctl_add_rule */
 	{ AS(rctl_remove_rule_args), (sy_call_t *)sys_rctl_remove_rule, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 529 = rctl_remove_rule */
 	{ AS(posix_fallocate_args), (sy_call_t *)sys_posix_fallocate, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 530 = posix_fallocate */
 	{ AS(posix_fadvise_args), (sy_call_t *)sys_posix_fadvise, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 531 = posix_fadvise */
 	{ AS(wait6_args), (sy_call_t *)sys_wait6, AUE_WAIT6, NULL, 0, 0, 0, SY_THR_STATIC },	/* 532 = wait6 */
 	{ AS(cap_rights_limit_args), (sy_call_t *)sys_cap_rights_limit, AUE_CAP_RIGHTS_LIMIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 533 = cap_rights_limit */
 	{ AS(cap_ioctls_limit_args), (sy_call_t *)sys_cap_ioctls_limit, AUE_CAP_IOCTLS_LIMIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 534 = cap_ioctls_limit */
 	{ AS(cap_ioctls_get_args), (sy_call_t *)sys_cap_ioctls_get, AUE_CAP_IOCTLS_GET, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 535 = cap_ioctls_get */
 	{ AS(cap_fcntls_limit_args), (sy_call_t *)sys_cap_fcntls_limit, AUE_CAP_FCNTLS_LIMIT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 536 = cap_fcntls_limit */
 	{ AS(cap_fcntls_get_args), (sy_call_t *)sys_cap_fcntls_get, AUE_CAP_FCNTLS_GET, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 537 = cap_fcntls_get */
 	{ AS(bindat_args), (sy_call_t *)sys_bindat, AUE_BINDAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 538 = bindat */
 	{ AS(connectat_args), (sy_call_t *)sys_connectat, AUE_CONNECTAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 539 = connectat */
 	{ AS(chflagsat_args), (sy_call_t *)sys_chflagsat, AUE_CHFLAGSAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 540 = chflagsat */
 	{ AS(accept4_args), (sy_call_t *)sys_accept4, AUE_ACCEPT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 541 = accept4 */
 	{ AS(pipe2_args), (sy_call_t *)sys_pipe2, AUE_PIPE, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 542 = pipe2 */
 	{ AS(aio_mlock_args), (sy_call_t *)sys_aio_mlock, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 543 = aio_mlock */
 	{ AS(procctl_args), (sy_call_t *)sys_procctl, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 544 = procctl */
 	{ AS(ppoll_args), (sy_call_t *)sys_ppoll, AUE_POLL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 545 = ppoll */
 	{ AS(futimens_args), (sy_call_t *)sys_futimens, AUE_FUTIMES, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 546 = futimens */
 	{ AS(utimensat_args), (sy_call_t *)sys_utimensat, AUE_FUTIMESAT, NULL, 0, 0, SYF_CAPENABLED, SY_THR_STATIC },	/* 547 = utimensat */
 	{ AS(numa_getaffinity_args), (sy_call_t *)sys_numa_getaffinity, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 548 = numa_getaffinity */
 	{ AS(numa_setaffinity_args), (sy_call_t *)sys_numa_setaffinity, AUE_NULL, NULL, 0, 0, 0, SY_THR_STATIC },	/* 549 = numa_setaffinity */
 };
Index: user/alc/PQ_LAUNDRY/sys/kern/makesyscalls.sh
===================================================================
--- user/alc/PQ_LAUNDRY/sys/kern/makesyscalls.sh	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/kern/makesyscalls.sh	(revision 303775)
@@ -1,691 +1,691 @@
 #! /bin/sh -
 #	@(#)makesyscalls.sh	8.1 (Berkeley) 6/10/93
 # $FreeBSD$
 
 set -e
 
 # name of compat options:
 compat=COMPAT_43
 compat4=COMPAT_FREEBSD4
 compat6=COMPAT_FREEBSD6
 compat7=COMPAT_FREEBSD7
 compat10=COMPAT_FREEBSD10
 
 # output files:
 sysnames="syscalls.c"
 sysproto="../sys/sysproto.h"
 sysproto_h=_SYS_SYSPROTO_H_
 syshdr="../sys/syscall.h"
 sysmk="../sys/syscall.mk"
 syssw="init_sysent.c"
 syscallprefix="SYS_"
 switchname="sysent"
 namesname="syscallnames"
 systrace="systrace_args.c"
 
 # tmp files:
 sysaue="sysent.aue.$$"
 sysdcl="sysent.dcl.$$"
 syscompat="sysent.compat.$$"
 syscompatdcl="sysent.compatdcl.$$"
 syscompat4="sysent.compat4.$$"
 syscompat4dcl="sysent.compat4dcl.$$"
 syscompat6="sysent.compat6.$$"
 syscompat6dcl="sysent.compat6dcl.$$"
 syscompat7="sysent.compat7.$$"
 syscompat7dcl="sysent.compat7dcl.$$"
 syscompat10="sysent.compat10.$$"
 syscompat10dcl="sysent.compat10dcl.$$"
 sysent="sysent.switch.$$"
 sysinc="sysinc.switch.$$"
 sysarg="sysarg.switch.$$"
 sysprotoend="sysprotoend.$$"
 systracetmp="systrace.$$"
 systraceret="systraceret.$$"
 
 if [ -r capabilities.conf ]; then
 	capenabled=`cat capabilities.conf | grep -v "^#" | grep -v "^$"`
 	capenabled=`echo $capenabled | sed 's/ /,/g'`
 else
 	capenabled=""
 fi
 
 trap "rm $sysaue $sysdcl $syscompat $syscompatdcl $syscompat4 $syscompat4dcl $syscompat6 $syscompat6dcl $syscompat7 $syscompat7dcl $syscompat10 $syscompat10dcl $sysent $sysinc $sysarg $sysprotoend $systracetmp $systraceret" 0
 
 touch $sysaue $sysdcl $syscompat $syscompatdcl $syscompat4 $syscompat4dcl $syscompat6 $syscompat6dcl $syscompat7 $syscompat7dcl $syscompat10 $syscompat10dcl $sysent $sysinc $sysarg $sysprotoend $systracetmp $systraceret
 
 case $# in
     0)	echo "usage: $0 input-file <config-file>" 1>&2
 	exit 1
 	;;
 esac
 
 if [ -n "$2" ]; then
 	. $2
 fi
 
 sed -e '
 s/\$//g
 :join
 	/\\$/{a\
 
 	N
 	s/\\\n//
 	b join
 	}
 2,${
 	/^#/!s/\([{}()*,]\)/ \1 /g
 }
 ' < $1 | awk "
 	BEGIN {
 		sysaue = \"$sysaue\"
 		sysdcl = \"$sysdcl\"
 		sysproto = \"$sysproto\"
 		sysprotoend = \"$sysprotoend\"
 		sysproto_h = \"$sysproto_h\"
 		syscompat = \"$syscompat\"
 		syscompatdcl = \"$syscompatdcl\"
 		syscompat4 = \"$syscompat4\"
 		syscompat4dcl = \"$syscompat4dcl\"
 		syscompat6 = \"$syscompat6\"
 		syscompat6dcl = \"$syscompat6dcl\"
 		syscompat7 = \"$syscompat7\"
 		syscompat7dcl = \"$syscompat7dcl\"
 		syscompat10 = \"$syscompat10\"
 		syscompat10dcl = \"$syscompat10dcl\"
 		sysent = \"$sysent\"
 		syssw = \"$syssw\"
 		sysinc = \"$sysinc\"
 		sysarg = \"$sysarg\"
 		sysnames = \"$sysnames\"
 		syshdr = \"$syshdr\"
 		sysmk = \"$sysmk\"
 		systrace = \"$systrace\"
 		systracetmp = \"$systracetmp\"
 		systraceret = \"$systraceret\"
 		compat = \"$compat\"
 		compat4 = \"$compat4\"
 		compat6 = \"$compat6\"
 		compat7 = \"$compat7\"
 		compat10 = \"$compat10\"
 		syscallprefix = \"$syscallprefix\"
 		switchname = \"$switchname\"
 		namesname = \"$namesname\"
 		infile = \"$1\"
 		capenabled_string = \"$capenabled\"
 		"'
 
 		split(capenabled_string, capenabled, ",");
 
 		printf "/*\n * System call switch table.\n *\n" > syssw
 		printf " * DO NOT EDIT-- this file is automatically generated.\n" > syssw
 		printf " * $%s$\n", "FreeBSD" > syssw
 
 		printf "/*\n * System call prototypes.\n *\n" > sysarg
 		printf " * DO NOT EDIT-- this file is automatically generated.\n" > sysarg
 		printf " * $%s$\n", "FreeBSD" > sysarg
 
 		printf "\n#ifdef %s\n\n", compat > syscompat
 		printf "\n#ifdef %s\n\n", compat4 > syscompat4
 		printf "\n#ifdef %s\n\n", compat6 > syscompat6
 		printf "\n#ifdef %s\n\n", compat7 > syscompat7
 		printf "\n#ifdef %s\n\n", compat10 > syscompat10
 
 		printf "/*\n * System call names.\n *\n" > sysnames
 		printf " * DO NOT EDIT-- this file is automatically generated.\n" > sysnames
 		printf " * $%s$\n", "FreeBSD" > sysnames
 
 		printf "/*\n * System call numbers.\n *\n" > syshdr
 		printf " * DO NOT EDIT-- this file is automatically generated.\n" > syshdr
 		printf " * $%s$\n", "FreeBSD" > syshdr
 		printf "# FreeBSD system call object files.\n" > sysmk
 		printf "# DO NOT EDIT-- this file is automatically generated.\n" > sysmk
 		printf "# $%s$\n", "FreeBSD" > sysmk
 
 		printf "/*\n * System call argument to DTrace register array converstion.\n *\n" > systrace
 		printf " * DO NOT EDIT-- this file is automatically generated.\n" > systrace
 		printf " * $%s$\n", "FreeBSD" > systrace
 	}
 	NR == 1 {
 		gsub("[$]FreeBSD: ", "", $0)
 		gsub(" [$]", "", $0)
 
 		printf " * created from%s\n */\n\n", $0 > syssw
 
 		printf "\n/* The casts are bogus but will do for now. */\n" > sysent
 		printf "struct sysent %s[] = {\n",switchname > sysent
 
 		printf " * created from%s\n */\n\n", $0 > sysarg
 		printf "#ifndef %s\n", sysproto_h > sysarg
 		printf "#define\t%s\n\n", sysproto_h > sysarg
 		printf "#include <sys/signal.h>\n" > sysarg
 		printf "#include <sys/acl.h>\n" > sysarg
 		printf "#include <sys/cpuset.h>\n" > sysarg
 		printf "#include <sys/_ffcounter.h>\n" > sysarg
 		printf "#include <sys/_semaphore.h>\n" > sysarg
 		printf "#include <sys/ucontext.h>\n" > sysarg
 		printf "#include <sys/wait.h>\n\n" > sysarg
 		printf "#include <bsm/audit_kevents.h>\n\n" > sysarg
 		printf "struct proc;\n\n" > sysarg
 		printf "struct thread;\n\n" > sysarg
 		printf "#define\tPAD_(t)\t(sizeof(register_t) <= sizeof(t) ? \\\n" > sysarg
 		printf "\t\t0 : sizeof(register_t) - sizeof(t))\n\n" > sysarg
 		printf "#if BYTE_ORDER == LITTLE_ENDIAN\n"> sysarg
 		printf "#define\tPADL_(t)\t0\n" > sysarg
 		printf "#define\tPADR_(t)\tPAD_(t)\n" > sysarg
 		printf "#else\n" > sysarg
 		printf "#define\tPADL_(t)\tPAD_(t)\n" > sysarg
 		printf "#define\tPADR_(t)\t0\n" > sysarg
 		printf "#endif\n\n" > sysarg
 
 		printf " * created from%s\n */\n\n", $0 > sysnames
 		printf "const char *%s[] = {\n", namesname > sysnames
 
 		printf " * created from%s\n */\n\n", $0 > syshdr
 
 		printf "# created from%s\nMIASM = ", $0 > sysmk
 
 		printf " * This file is part of the DTrace syscall provider.\n */\n\n" > systrace
 		printf "static void\nsystrace_args(int sysnum, void *params, uint64_t *uarg, int *n_args)\n{\n" > systrace
 		printf "\tint64_t *iarg  = (int64_t *) uarg;\n" > systrace
 		printf "\tswitch (sysnum) {\n" > systrace
 
 		printf "static void\nsystrace_entry_setargdesc(int sysnum, int ndx, char *desc, size_t descsz)\n{\n\tconst char *p = NULL;\n" > systracetmp
 		printf "\tswitch (sysnum) {\n" > systracetmp
 
 		printf "static void\nsystrace_return_setargdesc(int sysnum, int ndx, char *desc, size_t descsz)\n{\n\tconst char *p = NULL;\n" > systraceret
 		printf "\tswitch (sysnum) {\n" > systraceret
 
 		next
 	}
 	NF == 0 || $1 ~ /^;/ {
 		next
 	}
 	$1 ~ /^#[ 	]*include/ {
 		print > sysinc
 		next
 	}
 	$1 ~ /^#[ 	]*if/ {
 		print > sysent
 		print > sysdcl
 		print > sysarg
 		print > syscompat
 		print > syscompat4
 		print > syscompat6
 		print > syscompat7
 		print > syscompat10
 		print > sysnames
 		print > systrace
 		print > systracetmp
 		print > systraceret
 		savesyscall = syscall
 		next
 	}
 	$1 ~ /^#[ 	]*else/ {
 		print > sysent
 		print > sysdcl
 		print > sysarg
 		print > syscompat
 		print > syscompat4
 		print > syscompat6
 		print > syscompat7
 		print > syscompat10
 		print > sysnames
 		print > systrace
 		print > systracetmp
 		print > systraceret
 		syscall = savesyscall
 		next
 	}
 	$1 ~ /^#/ {
 		print > sysent
 		print > sysdcl
 		print > sysarg
 		print > syscompat
 		print > syscompat4
 		print > syscompat6
 		print > syscompat7
 		print > syscompat10
 		print > sysnames
 		print > systrace
 		print > systracetmp
 		print > systraceret
 		next
 	}
 	syscall != $1 {
 		printf "%s: line %d: syscall number out of sync at %d\n",
 		    infile, NR, syscall
 		printf "line is:\n"
 		print
 		exit 1
 	}
 	# Returns true if the type "name" is the first flag in the type field
 	function type(name, flags, n) {
 		n = split($3, flags, /\|/)
 		return (n > 0 && flags[1] == name)
 	}
 	# Returns true if the flag "name" is set in the type field
 	function flag(name, flags, i, n) {
 		n = split($3, flags, /\|/)
 		for (i = 1; i <= n; i++)
 			if (flags[i] == name)
 				return 1
 		return 0
 	}
 	function align_sysent_comment(column) {
 		printf("\t") > sysent
 		column = column + 8 - column % 8
 		while (column < 56) {
 			printf("\t") > sysent
 			column = column + 8
 		}
 	}
 	function parserr(was, wanted) {
 		printf "%s: line %d: unexpected %s (expected %s)\n",
 		    infile, NR, was, wanted
 		exit 1
 	}
 	function parseline() {
 		f=4			# toss number, type, audit event
 		argc= 0;
 		argssize = "0"
 		thr_flag = "SY_THR_STATIC"
 		if (flag("NOTSTATIC")) {
 			thr_flag = "SY_THR_ABSENT"
 		}
 		if ($NF != "}") {
 			funcalias=$(NF-2)
 			argalias=$(NF-1)
 			rettype=$NF
 			end=NF-3
 		} else {
 			funcalias=""
 			argalias=""
 			rettype="int"
 			end=NF
 		}
 		if (flag("NODEF")) {
 			auditev="AUE_NULL"
 			funcname=$4
 			argssize = "AS(" $6 ")"
 			return
 		}
 		if ($f != "{")
 			parserr($f, "{")
 		f++
 		if ($end != "}")
 			parserr($end, "}")
 		end--
 		if ($end != ";")
 			parserr($end, ";")
 		end--
 		if ($end != ")")
 			parserr($end, ")")
 		end--
 
 		syscallret=$f
 		f++
 
 		funcname=$f
 
 		#
 		# We now know the func name, so define a flags field for it.
 		# Do this before any other processing as we may return early
 		# from it.
 		#
 		for (cap in capenabled) {
 			if (funcname == capenabled[cap]) {
 				flags = "SYF_CAPENABLED";
 				break;
 			}
 		}
 
 		if (funcalias == "")
 			funcalias = funcname
 		if (argalias == "") {
 			argalias = funcname "_args"
 			if (flag("COMPAT"))
 				argalias = "o" argalias
 			if (flag("COMPAT4"))
 				argalias = "freebsd4_" argalias
 			if (flag("COMPAT6"))
 				argalias = "freebsd6_" argalias
 			if (flag("COMPAT7"))
 				argalias = "freebsd7_" argalias
 			if (flag("COMPAT10"))
 				argalias = "freebsd10_" argalias
 		}
 		f++
 
 		if ($f != "(")
 			parserr($f, ")")
 		f++
 
 		if (f == end) {
 			if ($f != "void")
 				parserr($f, "argument definition")
 			return
 		}
 
 		while (f <= end) {
 			argc++
 			argtype[argc]=""
 			oldf=""
 			while (f < end && $(f+1) != ",") {
 				if (argtype[argc] != "" && oldf != "*")
 					argtype[argc] = argtype[argc]" ";
 				argtype[argc] = argtype[argc]$f;
 				oldf = $f;
 				f++
 			}
 			if (argtype[argc] == "")
 				parserr($f, "argument definition")
 			argname[argc]=$f;
 			f += 2;			# skip name, and any comma
 		}
 		if (argc != 0)
 			argssize = "AS(" argalias ")"
 	}
 	{	comment = $4
 		if (NF < 7)
 			for (i = 5; i <= NF; i++)
 				comment = comment " " $i
 	}
 
 	#
 	# The AUE_ audit event identifier.
 	#
 	{
 		auditev = $2;
 	}
 
 	#
 	# The flags, if any.
 	#
 	{
 		flags = "0";
 	}
 
 	type("STD") || type("NODEF") || type("NOARGS") || type("NOPROTO") \
 	    || type("NOSTD") {
 		parseline()
 		printf("\t/* %s */\n\tcase %d: {\n", funcname, syscall) > systrace
 		printf("\t/* %s */\n\tcase %d:\n", funcname, syscall) > systracetmp
 		printf("\t/* %s */\n\tcase %d:\n", funcname, syscall) > systraceret
 		if (argc > 0) {
 			printf("\t\tswitch(ndx) {\n") > systracetmp
 			printf("\t\tstruct %s *p = params;\n", argalias) > systrace
 			for (i = 1; i <= argc; i++) {
 				arg = argtype[i]
 				sub("__restrict$", "", arg)
 				printf("\t\tcase %d:\n\t\t\tp = \"%s\";\n\t\t\tbreak;\n", i - 1, arg) > systracetmp
 				if (index(arg, "*") > 0 || arg == "caddr_t")
 					printf("\t\tuarg[%d] = (intptr_t) p->%s; /* %s */\n", \
 					     i - 1, \
 					     argname[i], arg) > systrace
 				else if (arg == "union l_semun")
 					printf("\t\tuarg[%d] = p->%s.buf; /* %s */\n", \
 					     i - 1, \
 					     argname[i], arg) > systrace
 				else if (substr(arg, 1, 1) == "u" || arg == "size_t")
 					printf("\t\tuarg[%d] = p->%s; /* %s */\n", \
 					     i - 1, \
 					     argname[i], arg) > systrace
 				else
 					printf("\t\tiarg[%d] = p->%s; /* %s */\n", \
 					     i - 1, \
 					     argname[i], arg) > systrace
 			}
 			printf("\t\tdefault:\n\t\t\tbreak;\n\t\t};\n") > systracetmp
 
 			printf("\t\tif (ndx == 0 || ndx == 1)\n") > systraceret
 			printf("\t\t\tp = \"%s\";\n", syscallret) > systraceret
 			printf("\t\tbreak;\n") > systraceret
 		}
 		printf("\t\t*n_args = %d;\n\t\tbreak;\n\t}\n", argc) > systrace
 		printf("\t\tbreak;\n") > systracetmp
 		if (argc != 0 && !flag("NOARGS") && !flag("NOPROTO") && \
 		    !flag("NODEF")) {
 			printf("struct %s {\n", argalias) > sysarg
 			for (i = 1; i <= argc; i++)
 				printf("\tchar %s_l_[PADL_(%s)]; " \
 				    "%s %s; char %s_r_[PADR_(%s)];\n",
 				    argname[i], argtype[i],
 				    argtype[i], argname[i],
 				    argname[i], argtype[i]) > sysarg
 			printf("};\n") > sysarg
 		}
 		else if (!flag("NOARGS") && !flag("NOPROTO") && !flag("NODEF"))
 			printf("struct %s {\n\tregister_t dummy;\n};\n",
 			    argalias) > sysarg
 		if (!flag("NOPROTO") && !flag("NODEF")) {
 			if (funcname == "nosys" || funcname == "lkmnosys" ||
 			    funcname == "sysarch" || funcname ~ /^freebsd/ || 
 			    funcname ~ /^linux/ || funcname ~ /^svr4/ || 
 			    funcname ~ /^ibcs2/ || funcname ~ /^xenix/ ||
 			    funcname ~ /^cloudabi/) {
 				printf("%s\t%s(struct thread *, struct %s *)",
 				    rettype, funcname, argalias) > sysdcl
 			} else {
 				printf("%s\tsys_%s(struct thread *, struct %s *)",
 				    rettype, funcname, argalias) > sysdcl
 			} 
 			printf(";\n") > sysdcl
 			printf("#define\t%sAUE_%s\t%s\n", syscallprefix,
 			    funcalias, auditev) > sysaue
 		}
 		printf("\t{ %s, (sy_call_t *)", argssize) > sysent
 		column = 8 + 2 + length(argssize) + 15
 		if (flag("NOSTD")) {
 			printf("lkmressys, AUE_NULL, NULL, 0, 0, %s, SY_THR_ABSENT },", flags) > sysent
 			column = column + length("lkmressys") + length("AUE_NULL") + 3
 		} else {
 			if (funcname == "nosys" || funcname == "sysarch" || 
 			    funcname == "lkmnosys" || funcname ~ /^freebsd/ ||
 			    funcname ~ /^linux/ || funcname ~ /^svr4/ ||
 			    funcname ~ /^ibcs2/ || funcname ~ /^xenix/ ||
 			    funcname ~ /^cloudabi/) {
 				printf("%s, %s, NULL, 0, 0, %s, %s },", funcname, auditev, flags, thr_flag) > sysent
 				column = column + length(funcname) + length(auditev) + length(flags) + 3 
 			} else {
 				printf("sys_%s, %s, NULL, 0, 0, %s, %s },", funcname, auditev, flags, thr_flag) > sysent
 				column = column + length(funcname) + length(auditev) + length(flags) + 3 + 4
 			} 
 		} 
 		align_sysent_comment(column)
 		printf("/* %d = %s */\n", syscall, funcalias) > sysent
 		printf("\t\"%s\",\t\t\t/* %d = %s */\n",
 		    funcalias, syscall, funcalias) > sysnames
 		if (!flag("NODEF")) {
 			printf("#define\t%s%s\t%d\n", syscallprefix,
 		    	    funcalias, syscall) > syshdr
 			printf(" \\\n\t%s.o", funcalias) > sysmk
 		}
 		syscall++
 		next
 	}
 	type("COMPAT") || type("COMPAT4") || type("COMPAT6") || \
 	    type("COMPAT7") || type("COMPAT10") {
 		if (flag("COMPAT")) {
 			ncompat++
 			out = syscompat
 			outdcl = syscompatdcl
 			wrap = "compat"
 			prefix = "o"
 			descr = "old"
 		} else if (flag("COMPAT4")) {
 			ncompat4++
 			out = syscompat4
 			outdcl = syscompat4dcl
 			wrap = "compat4"
 			prefix = "freebsd4_"
 			descr = "freebsd4"
 		} else if (flag("COMPAT6")) {
 			ncompat6++
 			out = syscompat6
 			outdcl = syscompat6dcl
 			wrap = "compat6"
 			prefix = "freebsd6_"
 			descr = "freebsd6"
 		} else if (flag("COMPAT7")) {
 			ncompat7++
 			out = syscompat7
 			outdcl = syscompat7dcl
 			wrap = "compat7"
 			prefix = "freebsd7_"
 			descr = "freebsd7"
 		} else if (flag("COMPAT10")) {
 			ncompat10++
 			out = syscompat10
 			outdcl = syscompat10dcl
 			wrap = "compat10"
 			prefix = "freebsd10_"
 			descr = "freebsd10"
 		}
 		parseline()
 		if (argc != 0 && !flag("NOARGS") && !flag("NOPROTO") && \
 		    !flag("NODEF")) {
 			printf("struct %s {\n", argalias) > out
 			for (i = 1; i <= argc; i++)
 				printf("\tchar %s_l_[PADL_(%s)]; %s %s; " \
 				    "char %s_r_[PADR_(%s)];\n",
 				    argname[i], argtype[i],
 				    argtype[i], argname[i],
 				    argname[i], argtype[i]) > out
 			printf("};\n") > out
 		}
 		else if (!flag("NOARGS") && !flag("NOPROTO") && !flag("NODEF"))
 			printf("struct %s {\n\tregister_t dummy;\n};\n",
 			    argalias) > sysarg
 		if (!flag("NOPROTO") && !flag("NODEF")) {
 			printf("%s\t%s%s(struct thread *, struct %s *);\n",
 			    rettype, prefix, funcname, argalias) > outdcl
 			printf("#define\t%sAUE_%s%s\t%s\n", syscallprefix,
 			    prefix, funcname, auditev) > sysaue
 		}
 		if (flag("NOSTD")) {
 			printf("\t{ %s, (sy_call_t *)%s, %s, NULL, 0, 0, 0, SY_THR_ABSENT },",
 			    "0", "lkmressys", "AUE_NULL") > sysent
 			align_sysent_comment(8 + 2 + length("0") + 15 + \
 			    length("lkmressys") + length("AUE_NULL") + 3)
 		} else {
 			printf("\t{ %s(%s,%s), %s, NULL, 0, 0, %s, %s },",
 			    wrap, argssize, funcname, auditev, flags, thr_flag) > sysent
 			align_sysent_comment(8 + 9 + length(argssize) + 1 + \
 			    length(funcname) + length(auditev) + \
 			    length(flags) + 4)
 		}
 		printf("/* %d = %s %s */\n", syscall, descr, funcalias) > sysent
 		printf("\t\"%s.%s\",\t\t/* %d = %s %s */\n",
 		    wrap, funcalias, syscall, descr, funcalias) > sysnames
-		# XXX-BD: why no COMPAT7?
-		if (flag("COMPAT") || flag("COMPAT4") || flag("COMPAT6") || flag("COMPAT10")) {
+		# Do not provide freebsdN_* symbols in libc for < FreeBSD 7
+		if (flag("COMPAT") || flag("COMPAT4") || flag("COMPAT6")) {
 			printf("\t\t\t\t/* %d is %s %s */\n",
 			    syscall, descr, funcalias) > syshdr
 		} else if (!flag("NODEF")) {
 			printf("#define\t%s%s%s\t%d\n", syscallprefix,
 			    prefix, funcalias, syscall) > syshdr
 			printf(" \\\n\t%s%s.o", prefix, funcalias) > sysmk
 		}
 		syscall++
 		next
 	}
 	type("OBSOL") {
 		printf("\t{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },") > sysent
 		align_sysent_comment(34)
 		printf("/* %d = obsolete %s */\n", syscall, comment) > sysent
 		printf("\t\"obs_%s\",\t\t\t/* %d = obsolete %s */\n",
 		    $4, syscall, comment) > sysnames
 		printf("\t\t\t\t/* %d is obsolete %s */\n",
 		    syscall, comment) > syshdr
 		syscall++
 		next
 	}
 	type("UNIMPL") {
 		printf("\t{ 0, (sy_call_t *)nosys, AUE_NULL, NULL, 0, 0, 0, SY_THR_ABSENT },\t\t\t/* %d = %s */\n",
 		    syscall, comment) > sysent
 		printf("\t\"#%d\",\t\t\t/* %d = %s */\n",
 		    syscall, syscall, comment) > sysnames
 		syscall++
 		next
 	}
 	{
 		printf "%s: line %d: unrecognized keyword %s\n", infile, NR, $3
 		exit 1
 	}
 	END {
 		printf "\n#define AS(name) (sizeof(struct name) / sizeof(register_t))\n" > sysinc
 
 		if (ncompat != 0 || ncompat4 != 0 || ncompat6 != 0 || ncompat7 != 0 || ncompat10 != 0)
 			printf "#include \"opt_compat.h\"\n\n" > syssw
 
 		if (ncompat != 0) {
 			printf "\n#ifdef %s\n", compat > sysinc
 			printf "#define compat(n, name) n, (sy_call_t *)__CONCAT(o,name)\n" > sysinc
 			printf "#else\n" > sysinc
 			printf "#define compat(n, name) 0, (sy_call_t *)nosys\n" > sysinc
 			printf "#endif\n" > sysinc
 		}
 
 		if (ncompat4 != 0) {
 			printf "\n#ifdef %s\n", compat4 > sysinc
 			printf "#define compat4(n, name) n, (sy_call_t *)__CONCAT(freebsd4_,name)\n" > sysinc
 			printf "#else\n" > sysinc
 			printf "#define compat4(n, name) 0, (sy_call_t *)nosys\n" > sysinc
 			printf "#endif\n" > sysinc
 		}
 
 		if (ncompat6 != 0) {
 			printf "\n#ifdef %s\n", compat6 > sysinc
 			printf "#define compat6(n, name) n, (sy_call_t *)__CONCAT(freebsd6_,name)\n" > sysinc
 			printf "#else\n" > sysinc
 			printf "#define compat6(n, name) 0, (sy_call_t *)nosys\n" > sysinc
 			printf "#endif\n" > sysinc
 		}
 
 		if (ncompat7 != 0) {
 			printf "\n#ifdef %s\n", compat7 > sysinc
 			printf "#define compat7(n, name) n, (sy_call_t *)__CONCAT(freebsd7_,name)\n" > sysinc
 			printf "#else\n" > sysinc
 			printf "#define compat7(n, name) 0, (sy_call_t *)nosys\n" > sysinc
 			printf "#endif\n" > sysinc
 		}
 		if (ncompat10 != 0) {
 			printf "\n#ifdef %s\n", compat10 > sysinc
 			printf "#define compat10(n, name) n, (sy_call_t *)__CONCAT(freebsd10_,name)\n" > sysinc
 			printf "#else\n" > sysinc
 			printf "#define compat10(n, name) 0, (sy_call_t *)nosys\n" > sysinc
 			printf "#endif\n" > sysinc
 		}
 		printf("\n#endif /* %s */\n\n", compat) > syscompatdcl
 		printf("\n#endif /* %s */\n\n", compat4) > syscompat4dcl
 		printf("\n#endif /* %s */\n\n", compat6) > syscompat6dcl
 		printf("\n#endif /* %s */\n\n", compat7) > syscompat7dcl
 		printf("\n#endif /* %s */\n\n", compat10) > syscompat10dcl
 
 		printf("\n#undef PAD_\n") > sysprotoend
 		printf("#undef PADL_\n") > sysprotoend
 		printf("#undef PADR_\n") > sysprotoend
 		printf("\n#endif /* !%s */\n", sysproto_h) > sysprotoend
 
 		printf("\n") > sysmk
 		printf("};\n") > sysent
 		printf("};\n") > sysnames
 		printf("#define\t%sMAXSYSCALL\t%d\n", syscallprefix, syscall) \
 		    > syshdr
 		printf "\tdefault:\n\t\t*n_args = 0;\n\t\tbreak;\n\t};\n}\n" > systrace
 		printf "\tdefault:\n\t\tbreak;\n\t};\n\tif (p != NULL)\n\t\tstrlcpy(desc, p, descsz);\n}\n" > systracetmp
 		printf "\tdefault:\n\t\tbreak;\n\t};\n\tif (p != NULL)\n\t\tstrlcpy(desc, p, descsz);\n}\n" > systraceret
 	} '
 
 cat $sysinc $sysent >> $syssw
 cat $sysarg $sysdcl \
 	$syscompat $syscompatdcl \
 	$syscompat4 $syscompat4dcl \
 	$syscompat6 $syscompat6dcl \
 	$syscompat7 $syscompat7dcl \
 	$syscompat10 $syscompat10dcl \
 	$sysaue $sysprotoend > $sysproto
 cat $systracetmp >> $systrace
 cat $systraceret >> $systrace
 
Index: user/alc/PQ_LAUNDRY/sys/kern/syscalls.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/kern/syscalls.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/kern/syscalls.c	(revision 303775)
@@ -1,560 +1,560 @@
 /*
  * System call names.
  *
  * DO NOT EDIT-- this file is automatically generated.
  * $FreeBSD$
- * created from FreeBSD: head/sys/kern/syscalls.master 303700 2016-08-03 06:35:58Z ed 
+ * created from FreeBSD: head/sys/kern/syscalls.master 303729 2016-08-03 18:48:56Z bdrewery 
  */
 
 const char *syscallnames[] = {
 	"syscall",			/* 0 = syscall */
 	"exit",			/* 1 = exit */
 	"fork",			/* 2 = fork */
 	"read",			/* 3 = read */
 	"write",			/* 4 = write */
 	"open",			/* 5 = open */
 	"close",			/* 6 = close */
 	"wait4",			/* 7 = wait4 */
 	"compat.creat",		/* 8 = old creat */
 	"link",			/* 9 = link */
 	"unlink",			/* 10 = unlink */
 	"obs_execv",			/* 11 = obsolete execv */
 	"chdir",			/* 12 = chdir */
 	"fchdir",			/* 13 = fchdir */
 	"mknod",			/* 14 = mknod */
 	"chmod",			/* 15 = chmod */
 	"chown",			/* 16 = chown */
 	"break",			/* 17 = break */
 	"compat4.getfsstat",		/* 18 = freebsd4 getfsstat */
 	"compat.lseek",		/* 19 = old lseek */
 	"getpid",			/* 20 = getpid */
 	"mount",			/* 21 = mount */
 	"unmount",			/* 22 = unmount */
 	"setuid",			/* 23 = setuid */
 	"getuid",			/* 24 = getuid */
 	"geteuid",			/* 25 = geteuid */
 	"ptrace",			/* 26 = ptrace */
 	"recvmsg",			/* 27 = recvmsg */
 	"sendmsg",			/* 28 = sendmsg */
 	"recvfrom",			/* 29 = recvfrom */
 	"accept",			/* 30 = accept */
 	"getpeername",			/* 31 = getpeername */
 	"getsockname",			/* 32 = getsockname */
 	"access",			/* 33 = access */
 	"chflags",			/* 34 = chflags */
 	"fchflags",			/* 35 = fchflags */
 	"sync",			/* 36 = sync */
 	"kill",			/* 37 = kill */
 	"compat.stat",		/* 38 = old stat */
 	"getppid",			/* 39 = getppid */
 	"compat.lstat",		/* 40 = old lstat */
 	"dup",			/* 41 = dup */
 	"compat10.pipe",		/* 42 = freebsd10 pipe */
 	"getegid",			/* 43 = getegid */
 	"profil",			/* 44 = profil */
 	"ktrace",			/* 45 = ktrace */
 	"compat.sigaction",		/* 46 = old sigaction */
 	"getgid",			/* 47 = getgid */
 	"compat.sigprocmask",		/* 48 = old sigprocmask */
 	"getlogin",			/* 49 = getlogin */
 	"setlogin",			/* 50 = setlogin */
 	"acct",			/* 51 = acct */
 	"compat.sigpending",		/* 52 = old sigpending */
 	"sigaltstack",			/* 53 = sigaltstack */
 	"ioctl",			/* 54 = ioctl */
 	"reboot",			/* 55 = reboot */
 	"revoke",			/* 56 = revoke */
 	"symlink",			/* 57 = symlink */
 	"readlink",			/* 58 = readlink */
 	"execve",			/* 59 = execve */
 	"umask",			/* 60 = umask */
 	"chroot",			/* 61 = chroot */
 	"compat.fstat",		/* 62 = old fstat */
 	"compat.getkerninfo",		/* 63 = old getkerninfo */
 	"compat.getpagesize",		/* 64 = old getpagesize */
 	"msync",			/* 65 = msync */
 	"vfork",			/* 66 = vfork */
 	"obs_vread",			/* 67 = obsolete vread */
 	"obs_vwrite",			/* 68 = obsolete vwrite */
 	"sbrk",			/* 69 = sbrk */
 	"sstk",			/* 70 = sstk */
 	"compat.mmap",		/* 71 = old mmap */
 	"vadvise",			/* 72 = vadvise */
 	"munmap",			/* 73 = munmap */
 	"mprotect",			/* 74 = mprotect */
 	"madvise",			/* 75 = madvise */
 	"obs_vhangup",			/* 76 = obsolete vhangup */
 	"obs_vlimit",			/* 77 = obsolete vlimit */
 	"mincore",			/* 78 = mincore */
 	"getgroups",			/* 79 = getgroups */
 	"setgroups",			/* 80 = setgroups */
 	"getpgrp",			/* 81 = getpgrp */
 	"setpgid",			/* 82 = setpgid */
 	"setitimer",			/* 83 = setitimer */
 	"compat.wait",		/* 84 = old wait */
 	"swapon",			/* 85 = swapon */
 	"getitimer",			/* 86 = getitimer */
 	"compat.gethostname",		/* 87 = old gethostname */
 	"compat.sethostname",		/* 88 = old sethostname */
 	"getdtablesize",			/* 89 = getdtablesize */
 	"dup2",			/* 90 = dup2 */
 	"#91",			/* 91 = getdopt */
 	"fcntl",			/* 92 = fcntl */
 	"select",			/* 93 = select */
 	"#94",			/* 94 = setdopt */
 	"fsync",			/* 95 = fsync */
 	"setpriority",			/* 96 = setpriority */
 	"socket",			/* 97 = socket */
 	"connect",			/* 98 = connect */
 	"compat.accept",		/* 99 = old accept */
 	"getpriority",			/* 100 = getpriority */
 	"compat.send",		/* 101 = old send */
 	"compat.recv",		/* 102 = old recv */
 	"compat.sigreturn",		/* 103 = old sigreturn */
 	"bind",			/* 104 = bind */
 	"setsockopt",			/* 105 = setsockopt */
 	"listen",			/* 106 = listen */
 	"obs_vtimes",			/* 107 = obsolete vtimes */
 	"compat.sigvec",		/* 108 = old sigvec */
 	"compat.sigblock",		/* 109 = old sigblock */
 	"compat.sigsetmask",		/* 110 = old sigsetmask */
 	"compat.sigsuspend",		/* 111 = old sigsuspend */
 	"compat.sigstack",		/* 112 = old sigstack */
 	"compat.recvmsg",		/* 113 = old recvmsg */
 	"compat.sendmsg",		/* 114 = old sendmsg */
 	"obs_vtrace",			/* 115 = obsolete vtrace */
 	"gettimeofday",			/* 116 = gettimeofday */
 	"getrusage",			/* 117 = getrusage */
 	"getsockopt",			/* 118 = getsockopt */
 	"#119",			/* 119 = resuba */
 	"readv",			/* 120 = readv */
 	"writev",			/* 121 = writev */
 	"settimeofday",			/* 122 = settimeofday */
 	"fchown",			/* 123 = fchown */
 	"fchmod",			/* 124 = fchmod */
 	"compat.recvfrom",		/* 125 = old recvfrom */
 	"setreuid",			/* 126 = setreuid */
 	"setregid",			/* 127 = setregid */
 	"rename",			/* 128 = rename */
 	"compat.truncate",		/* 129 = old truncate */
 	"compat.ftruncate",		/* 130 = old ftruncate */
 	"flock",			/* 131 = flock */
 	"mkfifo",			/* 132 = mkfifo */
 	"sendto",			/* 133 = sendto */
 	"shutdown",			/* 134 = shutdown */
 	"socketpair",			/* 135 = socketpair */
 	"mkdir",			/* 136 = mkdir */
 	"rmdir",			/* 137 = rmdir */
 	"utimes",			/* 138 = utimes */
 	"obs_4.2",			/* 139 = obsolete 4.2 sigreturn */
 	"adjtime",			/* 140 = adjtime */
 	"compat.getpeername",		/* 141 = old getpeername */
 	"compat.gethostid",		/* 142 = old gethostid */
 	"compat.sethostid",		/* 143 = old sethostid */
 	"compat.getrlimit",		/* 144 = old getrlimit */
 	"compat.setrlimit",		/* 145 = old setrlimit */
 	"compat.killpg",		/* 146 = old killpg */
 	"setsid",			/* 147 = setsid */
 	"quotactl",			/* 148 = quotactl */
 	"compat.quota",		/* 149 = old quota */
 	"compat.getsockname",		/* 150 = old getsockname */
 	"#151",			/* 151 = sem_lock */
 	"#152",			/* 152 = sem_wakeup */
 	"#153",			/* 153 = asyncdaemon */
 	"nlm_syscall",			/* 154 = nlm_syscall */
 	"nfssvc",			/* 155 = nfssvc */
 	"compat.getdirentries",		/* 156 = old getdirentries */
 	"compat4.statfs",		/* 157 = freebsd4 statfs */
 	"compat4.fstatfs",		/* 158 = freebsd4 fstatfs */
 	"#159",			/* 159 = nosys */
 	"lgetfh",			/* 160 = lgetfh */
 	"getfh",			/* 161 = getfh */
 	"compat4.getdomainname",		/* 162 = freebsd4 getdomainname */
 	"compat4.setdomainname",		/* 163 = freebsd4 setdomainname */
 	"compat4.uname",		/* 164 = freebsd4 uname */
 	"sysarch",			/* 165 = sysarch */
 	"rtprio",			/* 166 = rtprio */
 	"#167",			/* 167 = nosys */
 	"#168",			/* 168 = nosys */
 	"semsys",			/* 169 = semsys */
 	"msgsys",			/* 170 = msgsys */
 	"shmsys",			/* 171 = shmsys */
 	"#172",			/* 172 = nosys */
 	"compat6.pread",		/* 173 = freebsd6 pread */
 	"compat6.pwrite",		/* 174 = freebsd6 pwrite */
 	"setfib",			/* 175 = setfib */
 	"ntp_adjtime",			/* 176 = ntp_adjtime */
 	"#177",			/* 177 = sfork */
 	"#178",			/* 178 = getdescriptor */
 	"#179",			/* 179 = setdescriptor */
 	"#180",			/* 180 = nosys */
 	"setgid",			/* 181 = setgid */
 	"setegid",			/* 182 = setegid */
 	"seteuid",			/* 183 = seteuid */
 	"#184",			/* 184 = lfs_bmapv */
 	"#185",			/* 185 = lfs_markv */
 	"#186",			/* 186 = lfs_segclean */
 	"#187",			/* 187 = lfs_segwait */
 	"stat",			/* 188 = stat */
 	"fstat",			/* 189 = fstat */
 	"lstat",			/* 190 = lstat */
 	"pathconf",			/* 191 = pathconf */
 	"fpathconf",			/* 192 = fpathconf */
 	"#193",			/* 193 = nosys */
 	"getrlimit",			/* 194 = getrlimit */
 	"setrlimit",			/* 195 = setrlimit */
 	"getdirentries",			/* 196 = getdirentries */
 	"compat6.mmap",		/* 197 = freebsd6 mmap */
 	"__syscall",			/* 198 = __syscall */
 	"compat6.lseek",		/* 199 = freebsd6 lseek */
 	"compat6.truncate",		/* 200 = freebsd6 truncate */
 	"compat6.ftruncate",		/* 201 = freebsd6 ftruncate */
 	"__sysctl",			/* 202 = __sysctl */
 	"mlock",			/* 203 = mlock */
 	"munlock",			/* 204 = munlock */
 	"undelete",			/* 205 = undelete */
 	"futimes",			/* 206 = futimes */
 	"getpgid",			/* 207 = getpgid */
 	"#208",			/* 208 = newreboot */
 	"poll",			/* 209 = poll */
 	"lkmnosys",			/* 210 = lkmnosys */
 	"lkmnosys",			/* 211 = lkmnosys */
 	"lkmnosys",			/* 212 = lkmnosys */
 	"lkmnosys",			/* 213 = lkmnosys */
 	"lkmnosys",			/* 214 = lkmnosys */
 	"lkmnosys",			/* 215 = lkmnosys */
 	"lkmnosys",			/* 216 = lkmnosys */
 	"lkmnosys",			/* 217 = lkmnosys */
 	"lkmnosys",			/* 218 = lkmnosys */
 	"lkmnosys",			/* 219 = lkmnosys */
 	"compat7.__semctl",		/* 220 = freebsd7 __semctl */
 	"semget",			/* 221 = semget */
 	"semop",			/* 222 = semop */
 	"#223",			/* 223 = semconfig */
 	"compat7.msgctl",		/* 224 = freebsd7 msgctl */
 	"msgget",			/* 225 = msgget */
 	"msgsnd",			/* 226 = msgsnd */
 	"msgrcv",			/* 227 = msgrcv */
 	"shmat",			/* 228 = shmat */
 	"compat7.shmctl",		/* 229 = freebsd7 shmctl */
 	"shmdt",			/* 230 = shmdt */
 	"shmget",			/* 231 = shmget */
 	"clock_gettime",			/* 232 = clock_gettime */
 	"clock_settime",			/* 233 = clock_settime */
 	"clock_getres",			/* 234 = clock_getres */
 	"ktimer_create",			/* 235 = ktimer_create */
 	"ktimer_delete",			/* 236 = ktimer_delete */
 	"ktimer_settime",			/* 237 = ktimer_settime */
 	"ktimer_gettime",			/* 238 = ktimer_gettime */
 	"ktimer_getoverrun",			/* 239 = ktimer_getoverrun */
 	"nanosleep",			/* 240 = nanosleep */
 	"ffclock_getcounter",			/* 241 = ffclock_getcounter */
 	"ffclock_setestimate",			/* 242 = ffclock_setestimate */
 	"ffclock_getestimate",			/* 243 = ffclock_getestimate */
 	"#244",			/* 244 = nosys */
 	"#245",			/* 245 = nosys */
 	"#246",			/* 246 = nosys */
 	"clock_getcpuclockid2",			/* 247 = clock_getcpuclockid2 */
 	"ntp_gettime",			/* 248 = ntp_gettime */
 	"#249",			/* 249 = nosys */
 	"minherit",			/* 250 = minherit */
 	"rfork",			/* 251 = rfork */
 	"openbsd_poll",			/* 252 = openbsd_poll */
 	"issetugid",			/* 253 = issetugid */
 	"lchown",			/* 254 = lchown */
 	"aio_read",			/* 255 = aio_read */
 	"aio_write",			/* 256 = aio_write */
 	"lio_listio",			/* 257 = lio_listio */
 	"#258",			/* 258 = nosys */
 	"#259",			/* 259 = nosys */
 	"#260",			/* 260 = nosys */
 	"#261",			/* 261 = nosys */
 	"#262",			/* 262 = nosys */
 	"#263",			/* 263 = nosys */
 	"#264",			/* 264 = nosys */
 	"#265",			/* 265 = nosys */
 	"#266",			/* 266 = nosys */
 	"#267",			/* 267 = nosys */
 	"#268",			/* 268 = nosys */
 	"#269",			/* 269 = nosys */
 	"#270",			/* 270 = nosys */
 	"#271",			/* 271 = nosys */
 	"getdents",			/* 272 = getdents */
 	"#273",			/* 273 = nosys */
 	"lchmod",			/* 274 = lchmod */
 	"netbsd_lchown",			/* 275 = netbsd_lchown */
 	"lutimes",			/* 276 = lutimes */
 	"netbsd_msync",			/* 277 = netbsd_msync */
 	"nstat",			/* 278 = nstat */
 	"nfstat",			/* 279 = nfstat */
 	"nlstat",			/* 280 = nlstat */
 	"#281",			/* 281 = nosys */
 	"#282",			/* 282 = nosys */
 	"#283",			/* 283 = nosys */
 	"#284",			/* 284 = nosys */
 	"#285",			/* 285 = nosys */
 	"#286",			/* 286 = nosys */
 	"#287",			/* 287 = nosys */
 	"#288",			/* 288 = nosys */
 	"preadv",			/* 289 = preadv */
 	"pwritev",			/* 290 = pwritev */
 	"#291",			/* 291 = nosys */
 	"#292",			/* 292 = nosys */
 	"#293",			/* 293 = nosys */
 	"#294",			/* 294 = nosys */
 	"#295",			/* 295 = nosys */
 	"#296",			/* 296 = nosys */
 	"compat4.fhstatfs",		/* 297 = freebsd4 fhstatfs */
 	"fhopen",			/* 298 = fhopen */
 	"fhstat",			/* 299 = fhstat */
 	"modnext",			/* 300 = modnext */
 	"modstat",			/* 301 = modstat */
 	"modfnext",			/* 302 = modfnext */
 	"modfind",			/* 303 = modfind */
 	"kldload",			/* 304 = kldload */
 	"kldunload",			/* 305 = kldunload */
 	"kldfind",			/* 306 = kldfind */
 	"kldnext",			/* 307 = kldnext */
 	"kldstat",			/* 308 = kldstat */
 	"kldfirstmod",			/* 309 = kldfirstmod */
 	"getsid",			/* 310 = getsid */
 	"setresuid",			/* 311 = setresuid */
 	"setresgid",			/* 312 = setresgid */
 	"obs_signanosleep",			/* 313 = obsolete signanosleep */
 	"aio_return",			/* 314 = aio_return */
 	"aio_suspend",			/* 315 = aio_suspend */
 	"aio_cancel",			/* 316 = aio_cancel */
 	"aio_error",			/* 317 = aio_error */
 	"compat6.aio_read",		/* 318 = freebsd6 aio_read */
 	"compat6.aio_write",		/* 319 = freebsd6 aio_write */
 	"compat6.lio_listio",		/* 320 = freebsd6 lio_listio */
 	"yield",			/* 321 = yield */
 	"obs_thr_sleep",			/* 322 = obsolete thr_sleep */
 	"obs_thr_wakeup",			/* 323 = obsolete thr_wakeup */
 	"mlockall",			/* 324 = mlockall */
 	"munlockall",			/* 325 = munlockall */
 	"__getcwd",			/* 326 = __getcwd */
 	"sched_setparam",			/* 327 = sched_setparam */
 	"sched_getparam",			/* 328 = sched_getparam */
 	"sched_setscheduler",			/* 329 = sched_setscheduler */
 	"sched_getscheduler",			/* 330 = sched_getscheduler */
 	"sched_yield",			/* 331 = sched_yield */
 	"sched_get_priority_max",			/* 332 = sched_get_priority_max */
 	"sched_get_priority_min",			/* 333 = sched_get_priority_min */
 	"sched_rr_get_interval",			/* 334 = sched_rr_get_interval */
 	"utrace",			/* 335 = utrace */
 	"compat4.sendfile",		/* 336 = freebsd4 sendfile */
 	"kldsym",			/* 337 = kldsym */
 	"jail",			/* 338 = jail */
 	"nnpfs_syscall",			/* 339 = nnpfs_syscall */
 	"sigprocmask",			/* 340 = sigprocmask */
 	"sigsuspend",			/* 341 = sigsuspend */
 	"compat4.sigaction",		/* 342 = freebsd4 sigaction */
 	"sigpending",			/* 343 = sigpending */
 	"compat4.sigreturn",		/* 344 = freebsd4 sigreturn */
 	"sigtimedwait",			/* 345 = sigtimedwait */
 	"sigwaitinfo",			/* 346 = sigwaitinfo */
 	"__acl_get_file",			/* 347 = __acl_get_file */
 	"__acl_set_file",			/* 348 = __acl_set_file */
 	"__acl_get_fd",			/* 349 = __acl_get_fd */
 	"__acl_set_fd",			/* 350 = __acl_set_fd */
 	"__acl_delete_file",			/* 351 = __acl_delete_file */
 	"__acl_delete_fd",			/* 352 = __acl_delete_fd */
 	"__acl_aclcheck_file",			/* 353 = __acl_aclcheck_file */
 	"__acl_aclcheck_fd",			/* 354 = __acl_aclcheck_fd */
 	"extattrctl",			/* 355 = extattrctl */
 	"extattr_set_file",			/* 356 = extattr_set_file */
 	"extattr_get_file",			/* 357 = extattr_get_file */
 	"extattr_delete_file",			/* 358 = extattr_delete_file */
 	"aio_waitcomplete",			/* 359 = aio_waitcomplete */
 	"getresuid",			/* 360 = getresuid */
 	"getresgid",			/* 361 = getresgid */
 	"kqueue",			/* 362 = kqueue */
 	"kevent",			/* 363 = kevent */
 	"#364",			/* 364 = __cap_get_proc */
 	"#365",			/* 365 = __cap_set_proc */
 	"#366",			/* 366 = __cap_get_fd */
 	"#367",			/* 367 = __cap_get_file */
 	"#368",			/* 368 = __cap_set_fd */
 	"#369",			/* 369 = __cap_set_file */
 	"#370",			/* 370 = nosys */
 	"extattr_set_fd",			/* 371 = extattr_set_fd */
 	"extattr_get_fd",			/* 372 = extattr_get_fd */
 	"extattr_delete_fd",			/* 373 = extattr_delete_fd */
 	"__setugid",			/* 374 = __setugid */
 	"#375",			/* 375 = nfsclnt */
 	"eaccess",			/* 376 = eaccess */
 	"afs3_syscall",			/* 377 = afs3_syscall */
 	"nmount",			/* 378 = nmount */
 	"#379",			/* 379 = kse_exit */
 	"#380",			/* 380 = kse_wakeup */
 	"#381",			/* 381 = kse_create */
 	"#382",			/* 382 = kse_thr_interrupt */
 	"#383",			/* 383 = kse_release */
 	"__mac_get_proc",			/* 384 = __mac_get_proc */
 	"__mac_set_proc",			/* 385 = __mac_set_proc */
 	"__mac_get_fd",			/* 386 = __mac_get_fd */
 	"__mac_get_file",			/* 387 = __mac_get_file */
 	"__mac_set_fd",			/* 388 = __mac_set_fd */
 	"__mac_set_file",			/* 389 = __mac_set_file */
 	"kenv",			/* 390 = kenv */
 	"lchflags",			/* 391 = lchflags */
 	"uuidgen",			/* 392 = uuidgen */
 	"sendfile",			/* 393 = sendfile */
 	"mac_syscall",			/* 394 = mac_syscall */
 	"getfsstat",			/* 395 = getfsstat */
 	"statfs",			/* 396 = statfs */
 	"fstatfs",			/* 397 = fstatfs */
 	"fhstatfs",			/* 398 = fhstatfs */
 	"#399",			/* 399 = nosys */
 	"ksem_close",			/* 400 = ksem_close */
 	"ksem_post",			/* 401 = ksem_post */
 	"ksem_wait",			/* 402 = ksem_wait */
 	"ksem_trywait",			/* 403 = ksem_trywait */
 	"ksem_init",			/* 404 = ksem_init */
 	"ksem_open",			/* 405 = ksem_open */
 	"ksem_unlink",			/* 406 = ksem_unlink */
 	"ksem_getvalue",			/* 407 = ksem_getvalue */
 	"ksem_destroy",			/* 408 = ksem_destroy */
 	"__mac_get_pid",			/* 409 = __mac_get_pid */
 	"__mac_get_link",			/* 410 = __mac_get_link */
 	"__mac_set_link",			/* 411 = __mac_set_link */
 	"extattr_set_link",			/* 412 = extattr_set_link */
 	"extattr_get_link",			/* 413 = extattr_get_link */
 	"extattr_delete_link",			/* 414 = extattr_delete_link */
 	"__mac_execve",			/* 415 = __mac_execve */
 	"sigaction",			/* 416 = sigaction */
 	"sigreturn",			/* 417 = sigreturn */
 	"#418",			/* 418 = __xstat */
 	"#419",			/* 419 = __xfstat */
 	"#420",			/* 420 = __xlstat */
 	"getcontext",			/* 421 = getcontext */
 	"setcontext",			/* 422 = setcontext */
 	"swapcontext",			/* 423 = swapcontext */
 	"swapoff",			/* 424 = swapoff */
 	"__acl_get_link",			/* 425 = __acl_get_link */
 	"__acl_set_link",			/* 426 = __acl_set_link */
 	"__acl_delete_link",			/* 427 = __acl_delete_link */
 	"__acl_aclcheck_link",			/* 428 = __acl_aclcheck_link */
 	"sigwait",			/* 429 = sigwait */
 	"thr_create",			/* 430 = thr_create */
 	"thr_exit",			/* 431 = thr_exit */
 	"thr_self",			/* 432 = thr_self */
 	"thr_kill",			/* 433 = thr_kill */
 	"#434",			/* 434 = nosys */
 	"#435",			/* 435 = nosys */
 	"jail_attach",			/* 436 = jail_attach */
 	"extattr_list_fd",			/* 437 = extattr_list_fd */
 	"extattr_list_file",			/* 438 = extattr_list_file */
 	"extattr_list_link",			/* 439 = extattr_list_link */
 	"#440",			/* 440 = kse_switchin */
 	"ksem_timedwait",			/* 441 = ksem_timedwait */
 	"thr_suspend",			/* 442 = thr_suspend */
 	"thr_wake",			/* 443 = thr_wake */
 	"kldunloadf",			/* 444 = kldunloadf */
 	"audit",			/* 445 = audit */
 	"auditon",			/* 446 = auditon */
 	"getauid",			/* 447 = getauid */
 	"setauid",			/* 448 = setauid */
 	"getaudit",			/* 449 = getaudit */
 	"setaudit",			/* 450 = setaudit */
 	"getaudit_addr",			/* 451 = getaudit_addr */
 	"setaudit_addr",			/* 452 = setaudit_addr */
 	"auditctl",			/* 453 = auditctl */
 	"_umtx_op",			/* 454 = _umtx_op */
 	"thr_new",			/* 455 = thr_new */
 	"sigqueue",			/* 456 = sigqueue */
 	"kmq_open",			/* 457 = kmq_open */
 	"kmq_setattr",			/* 458 = kmq_setattr */
 	"kmq_timedreceive",			/* 459 = kmq_timedreceive */
 	"kmq_timedsend",			/* 460 = kmq_timedsend */
 	"kmq_notify",			/* 461 = kmq_notify */
 	"kmq_unlink",			/* 462 = kmq_unlink */
 	"abort2",			/* 463 = abort2 */
 	"thr_set_name",			/* 464 = thr_set_name */
 	"aio_fsync",			/* 465 = aio_fsync */
 	"rtprio_thread",			/* 466 = rtprio_thread */
 	"#467",			/* 467 = nosys */
 	"#468",			/* 468 = nosys */
 	"#469",			/* 469 = __getpath_fromfd */
 	"#470",			/* 470 = __getpath_fromaddr */
 	"sctp_peeloff",			/* 471 = sctp_peeloff */
 	"sctp_generic_sendmsg",			/* 472 = sctp_generic_sendmsg */
 	"sctp_generic_sendmsg_iov",			/* 473 = sctp_generic_sendmsg_iov */
 	"sctp_generic_recvmsg",			/* 474 = sctp_generic_recvmsg */
 	"pread",			/* 475 = pread */
 	"pwrite",			/* 476 = pwrite */
 	"mmap",			/* 477 = mmap */
 	"lseek",			/* 478 = lseek */
 	"truncate",			/* 479 = truncate */
 	"ftruncate",			/* 480 = ftruncate */
 	"thr_kill2",			/* 481 = thr_kill2 */
 	"shm_open",			/* 482 = shm_open */
 	"shm_unlink",			/* 483 = shm_unlink */
 	"cpuset",			/* 484 = cpuset */
 	"cpuset_setid",			/* 485 = cpuset_setid */
 	"cpuset_getid",			/* 486 = cpuset_getid */
 	"cpuset_getaffinity",			/* 487 = cpuset_getaffinity */
 	"cpuset_setaffinity",			/* 488 = cpuset_setaffinity */
 	"faccessat",			/* 489 = faccessat */
 	"fchmodat",			/* 490 = fchmodat */
 	"fchownat",			/* 491 = fchownat */
 	"fexecve",			/* 492 = fexecve */
 	"fstatat",			/* 493 = fstatat */
 	"futimesat",			/* 494 = futimesat */
 	"linkat",			/* 495 = linkat */
 	"mkdirat",			/* 496 = mkdirat */
 	"mkfifoat",			/* 497 = mkfifoat */
 	"mknodat",			/* 498 = mknodat */
 	"openat",			/* 499 = openat */
 	"readlinkat",			/* 500 = readlinkat */
 	"renameat",			/* 501 = renameat */
 	"symlinkat",			/* 502 = symlinkat */
 	"unlinkat",			/* 503 = unlinkat */
 	"posix_openpt",			/* 504 = posix_openpt */
 	"gssd_syscall",			/* 505 = gssd_syscall */
 	"jail_get",			/* 506 = jail_get */
 	"jail_set",			/* 507 = jail_set */
 	"jail_remove",			/* 508 = jail_remove */
 	"closefrom",			/* 509 = closefrom */
 	"__semctl",			/* 510 = __semctl */
 	"msgctl",			/* 511 = msgctl */
 	"shmctl",			/* 512 = shmctl */
 	"lpathconf",			/* 513 = lpathconf */
 	"obs_cap_new",			/* 514 = obsolete cap_new */
 	"__cap_rights_get",			/* 515 = __cap_rights_get */
 	"cap_enter",			/* 516 = cap_enter */
 	"cap_getmode",			/* 517 = cap_getmode */
 	"pdfork",			/* 518 = pdfork */
 	"pdkill",			/* 519 = pdkill */
 	"pdgetpid",			/* 520 = pdgetpid */
 	"#521",			/* 521 = pdwait4 */
 	"pselect",			/* 522 = pselect */
 	"getloginclass",			/* 523 = getloginclass */
 	"setloginclass",			/* 524 = setloginclass */
 	"rctl_get_racct",			/* 525 = rctl_get_racct */
 	"rctl_get_rules",			/* 526 = rctl_get_rules */
 	"rctl_get_limits",			/* 527 = rctl_get_limits */
 	"rctl_add_rule",			/* 528 = rctl_add_rule */
 	"rctl_remove_rule",			/* 529 = rctl_remove_rule */
 	"posix_fallocate",			/* 530 = posix_fallocate */
 	"posix_fadvise",			/* 531 = posix_fadvise */
 	"wait6",			/* 532 = wait6 */
 	"cap_rights_limit",			/* 533 = cap_rights_limit */
 	"cap_ioctls_limit",			/* 534 = cap_ioctls_limit */
 	"cap_ioctls_get",			/* 535 = cap_ioctls_get */
 	"cap_fcntls_limit",			/* 536 = cap_fcntls_limit */
 	"cap_fcntls_get",			/* 537 = cap_fcntls_get */
 	"bindat",			/* 538 = bindat */
 	"connectat",			/* 539 = connectat */
 	"chflagsat",			/* 540 = chflagsat */
 	"accept4",			/* 541 = accept4 */
 	"pipe2",			/* 542 = pipe2 */
 	"aio_mlock",			/* 543 = aio_mlock */
 	"procctl",			/* 544 = procctl */
 	"ppoll",			/* 545 = ppoll */
 	"futimens",			/* 546 = futimens */
 	"utimensat",			/* 547 = utimensat */
 	"numa_getaffinity",			/* 548 = numa_getaffinity */
 	"numa_setaffinity",			/* 549 = numa_setaffinity */
 };
Index: user/alc/PQ_LAUNDRY/sys/net/iflib.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/net/iflib.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/net/iflib.c	(revision 303775)
@@ -1,4806 +1,4806 @@
 /*-
  * Copyright (c) 2014-2016, Matthew Macy <mmacy@nextbsd.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions are met:
  *
  *  1. Redistributions of source code must retain the above copyright notice,
  *     this list of conditions and the following disclaimer.
  *
  *  2. Neither the name of Matthew Macy nor the names of its
  *     contributors may be used to endorse or promote products derived from
  *     this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
  * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  * POSSIBILITY OF SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 #include "opt_acpi.h"
 
 #include <sys/param.h>
 #include <sys/types.h>
 #include <sys/bus.h>
 #include <sys/eventhandler.h>
 #include <sys/sockio.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/module.h>
 #include <sys/kobj.h>
 #include <sys/rman.h>
 #include <sys/sbuf.h>
 #include <sys/smp.h>
 #include <sys/socket.h>
 #include <sys/sysctl.h>
 #include <sys/syslog.h>
 #include <sys/taskqueue.h>
 
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/if_types.h>
 #include <net/if_media.h>
 #include <net/bpf.h>
 #include <net/ethernet.h>
 #include <net/mp_ring.h>
 
 #include <netinet/in.h>
 #include <netinet/in_pcb.h>
 #include <netinet/tcp_lro.h>
 #include <netinet/in_systm.h>
 #include <netinet/if_ether.h>
 #include <netinet/ip.h>
 #include <netinet/ip6.h>
 #include <netinet/tcp.h>
 
 #include <machine/bus.h>
 #include <machine/in_cksum.h>
 
 #include <vm/vm.h>
 #include <vm/pmap.h>
 
 #include <dev/led/led.h>
 #include <dev/pci/pcireg.h>
 #include <dev/pci/pcivar.h>
 #include <dev/pci/pci_private.h>
 
 #include <net/iflib.h>
 
 #include "ifdi_if.h"
 
 #if defined(__i386__) || defined(__amd64__)
 #include <sys/memdesc.h>
 #include <machine/bus.h>
 #include <machine/md_var.h>
 #include <machine/specialreg.h>
 #include <x86/include/busdma_impl.h>
 #include <x86/iommu/busdma_dmar.h>
 #endif
 
 
 /*
  * enable accounting of every mbuf as it comes in to and goes out of iflib's software descriptor references
  */
 #define MEMORY_LOGGING 0
 /*
  * Enable mbuf vectors for compressing long mbuf chains
  */
 
 
 /*
  * NB:
  * - Prefetching in tx cleaning should perhaps be a tunable. The distance ahead
  *   we prefetch needs to be determined by the time spent in m_free vis a vis
  *   the cost of a prefetch. This will of course vary based on the workload:
  *      - NFLX's m_free path is dominated by vm-based M_EXT manipulation which
  *        is quite expensive, thus suggesting very little prefetch.
  *      - small packet forwarding which is just returning a single mbuf to
  *        UMA will typically be very fast vis a vis the cost of a memory
  *        access.
  */
 
 
 /*
  * File organization:
  *  - private structures
  *  - iflib private utility functions
  *  - ifnet functions
  *  - vlan registry and other exported functions
  *  - iflib public core functions
  *
  *
  */
 static MALLOC_DEFINE(M_IFLIB, "iflib", "ifnet library");
 
 struct iflib_txq;
 typedef struct iflib_txq *iflib_txq_t;
 struct iflib_rxq;
 typedef struct iflib_rxq *iflib_rxq_t;
 struct iflib_fl;
 typedef struct iflib_fl *iflib_fl_t;
 
 typedef struct iflib_filter_info {
 	driver_filter_t *ifi_filter;
 	void *ifi_filter_arg;
 	struct grouptask *ifi_task;
 } *iflib_filter_info_t;
 
 struct iflib_ctx {
 	KOBJ_FIELDS;
    /*
    * Pointer to hardware driver's softc
    */
 	void *ifc_softc;
 	device_t ifc_dev;
 	if_t ifc_ifp;
 
 	cpuset_t ifc_cpus;
 	if_shared_ctx_t ifc_sctx;
 	struct if_softc_ctx ifc_softc_ctx;
 
 	struct mtx ifc_mtx;
 
 	uint16_t ifc_nhwtxqs;
 	uint16_t ifc_nhwrxqs;
 
 	iflib_txq_t ifc_txqs;
 	iflib_rxq_t ifc_rxqs;
 	uint32_t ifc_if_flags;
 	uint32_t ifc_flags;
 	uint32_t ifc_max_fl_buf_size;
 	int ifc_in_detach;
 
 	int ifc_link_state;
 	int ifc_link_irq;
 	int ifc_pause_frames;
 	int ifc_watchdog_events;
 	struct cdev *ifc_led_dev;
 	struct resource *ifc_msix_mem;
 
 	struct if_irq ifc_legacy_irq;
 	struct grouptask ifc_admin_task;
 	struct grouptask ifc_vflr_task;
 	struct iflib_filter_info ifc_filter_info;
 	struct ifmedia	ifc_media;
 
 	struct sysctl_oid *ifc_sysctl_node;
 	uint16_t ifc_sysctl_ntxqs;
 	uint16_t ifc_sysctl_nrxqs;
 	uint16_t ifc_sysctl_ntxds;
 	uint16_t ifc_sysctl_nrxds;
 	struct if_txrx ifc_txrx;
 #define isc_txd_encap  ifc_txrx.ift_txd_encap
 #define isc_txd_flush  ifc_txrx.ift_txd_flush
 #define isc_txd_credits_update  ifc_txrx.ift_txd_credits_update
 #define isc_rxd_available ifc_txrx.ift_rxd_available
 #define isc_rxd_pkt_get ifc_txrx.ift_rxd_pkt_get
 #define isc_rxd_refill ifc_txrx.ift_rxd_refill
 #define isc_rxd_flush ifc_txrx.ift_rxd_flush
 #define isc_rxd_refill ifc_txrx.ift_rxd_refill
 #define isc_rxd_refill ifc_txrx.ift_rxd_refill
 #define isc_legacy_intr ifc_txrx.ift_legacy_intr
 	eventhandler_tag ifc_vlan_attach_event;
 	eventhandler_tag ifc_vlan_detach_event;
 	uint8_t ifc_mac[ETHER_ADDR_LEN];
 	char ifc_mtx_name[16];
 };
 
 
 void *
 iflib_get_softc(if_ctx_t ctx)
 {
 
 	return (ctx->ifc_softc);
 }
 
 device_t
 iflib_get_dev(if_ctx_t ctx)
 {
 
 	return (ctx->ifc_dev);
 }
 
 if_t
 iflib_get_ifp(if_ctx_t ctx)
 {
 
 	return (ctx->ifc_ifp);
 }
 
 struct ifmedia *
 iflib_get_media(if_ctx_t ctx)
 {
 
 	return (&ctx->ifc_media);
 }
 
 void
 iflib_set_mac(if_ctx_t ctx, uint8_t mac[ETHER_ADDR_LEN])
 {
 
 	bcopy(mac, ctx->ifc_mac, ETHER_ADDR_LEN);
 }
 
 if_softc_ctx_t
 iflib_get_softc_ctx(if_ctx_t ctx)
 {
 
 	return (&ctx->ifc_softc_ctx);
 }
 
 if_shared_ctx_t
 iflib_get_sctx(if_ctx_t ctx)
 {
 
 	return (ctx->ifc_sctx);
 }
 
 #define CACHE_PTR_INCREMENT (CACHE_LINE_SIZE/sizeof(void*))
 
 #define LINK_ACTIVE(ctx) ((ctx)->ifc_link_state == LINK_STATE_UP)
 #define CTX_IS_VF(ctx) ((ctx)->ifc_sctx->isc_flags & IFLIB_IS_VF)
 
 #define RX_SW_DESC_MAP_CREATED	(1 << 0)
 #define TX_SW_DESC_MAP_CREATED	(1 << 1)
 #define RX_SW_DESC_INUSE        (1 << 3)
 #define TX_SW_DESC_MAPPED       (1 << 4)
 
 typedef struct iflib_sw_rx_desc {
 	bus_dmamap_t    ifsd_map;         /* bus_dma map for packet */
 	struct mbuf    *ifsd_m;           /* rx: uninitialized mbuf */
 	caddr_t         ifsd_cl;          /* direct cluster pointer for rx */
 	uint16_t	ifsd_flags;
 } *iflib_rxsd_t;
 
 typedef struct iflib_sw_tx_desc_val {
 	bus_dmamap_t    ifsd_map;         /* bus_dma map for packet */
 	struct mbuf    *ifsd_m;           /* pkthdr mbuf */
 	uint8_t		ifsd_flags;
 } *iflib_txsd_val_t;
 
 typedef struct iflib_sw_tx_desc_array {
 	bus_dmamap_t    *ifsd_map;         /* bus_dma maps for packet */
 	struct mbuf    **ifsd_m;           /* pkthdr mbufs */
 	uint8_t		*ifsd_flags;
 } iflib_txsd_array_t;
 
 
 /* magic number that should be high enough for any hardware */
 #define IFLIB_MAX_TX_SEGS		128
 #define IFLIB_MAX_RX_SEGS		32
 #define IFLIB_RX_COPY_THRESH		128
 #define IFLIB_MAX_RX_REFRESH		32
 #define IFLIB_QUEUE_IDLE		0
 #define IFLIB_QUEUE_HUNG		1
 #define IFLIB_QUEUE_WORKING		2
 
 /* this should really scale with ring size - 32 is a fairly arbitrary value for this */
 #define TX_BATCH_SIZE			16
 
 #define IFLIB_RESTART_BUDGET		8
 
 #define	IFC_LEGACY		0x1
 #define	IFC_QFLUSH		0x2
 #define	IFC_MULTISEG		0x4
 #define	IFC_DMAR		0x8
 
 #define CSUM_OFFLOAD		(CSUM_IP_TSO|CSUM_IP6_TSO|CSUM_IP| \
 				 CSUM_IP_UDP|CSUM_IP_TCP|CSUM_IP_SCTP| \
 				 CSUM_IP6_UDP|CSUM_IP6_TCP|CSUM_IP6_SCTP)
 struct iflib_txq {
 	uint16_t	ift_in_use;
 	uint16_t	ift_cidx;
 	uint16_t	ift_cidx_processed;
 	uint16_t	ift_pidx;
 	uint8_t		ift_gen;
 	uint8_t		ift_db_pending;
 	uint8_t		ift_db_pending_queued;
 	uint8_t		ift_npending;
 	/* implicit pad */
 	uint64_t	ift_processed;
 	uint64_t	ift_cleaned;
 #if MEMORY_LOGGING
 	uint64_t	ift_enqueued;
 	uint64_t	ift_dequeued;
 #endif
 	uint64_t	ift_no_tx_dma_setup;
 	uint64_t	ift_no_desc_avail;
 	uint64_t	ift_mbuf_defrag_failed;
 	uint64_t	ift_mbuf_defrag;
 	uint64_t	ift_map_failed;
 	uint64_t	ift_txd_encap_efbig;
 	uint64_t	ift_pullups;
 
 	struct mtx	ift_mtx;
 	struct mtx	ift_db_mtx;
 
 	/* constant values */
 	if_ctx_t	ift_ctx;
 	struct ifmp_ring        **ift_br;
 	struct grouptask	ift_task;
 	uint16_t	ift_size;
 	uint16_t	ift_id;
 	struct callout	ift_timer;
 	struct callout	ift_db_check;
 
 	iflib_txsd_array_t	ift_sds;
 	uint8_t			ift_nbr;
 	uint8_t			ift_qstatus;
 	uint8_t			ift_active;
 	uint8_t			ift_closed;
 	int			ift_watchdog_time;
 	struct iflib_filter_info ift_filter_info;
 	bus_dma_tag_t		ift_desc_tag;
 	bus_dma_tag_t		ift_tso_desc_tag;
 	iflib_dma_info_t	ift_ifdi;
 #define MTX_NAME_LEN 16
 	char                    ift_mtx_name[MTX_NAME_LEN];
 	char                    ift_db_mtx_name[MTX_NAME_LEN];
 	bus_dma_segment_t	ift_segs[IFLIB_MAX_TX_SEGS]  __aligned(CACHE_LINE_SIZE);
 } __aligned(CACHE_LINE_SIZE);
 
 struct iflib_fl {
 	uint16_t	ifl_cidx;
 	uint16_t	ifl_pidx;
 	uint16_t	ifl_credits;
 	uint8_t		ifl_gen;
 #if MEMORY_LOGGING
 	uint64_t	ifl_m_enqueued;
 	uint64_t	ifl_m_dequeued;
 	uint64_t	ifl_cl_enqueued;
 	uint64_t	ifl_cl_dequeued;
 #endif
 	/* implicit pad */
 
 	/* constant */
 	uint16_t	ifl_size;
 	uint16_t	ifl_buf_size;
 	uint16_t	ifl_cltype;
 	uma_zone_t	ifl_zone;
 	iflib_rxsd_t	ifl_sds;
 	iflib_rxq_t	ifl_rxq;
 	uint8_t		ifl_id;
 	bus_dma_tag_t           ifl_desc_tag;
 	iflib_dma_info_t	ifl_ifdi;
 	uint64_t	ifl_bus_addrs[IFLIB_MAX_RX_REFRESH] __aligned(CACHE_LINE_SIZE);
 	caddr_t		ifl_vm_addrs[IFLIB_MAX_RX_REFRESH];
 }  __aligned(CACHE_LINE_SIZE);
 
 static inline int
 get_inuse(int size, int cidx, int pidx, int gen)
 {
 	int used;
 
 	if (pidx > cidx)
 		used = pidx - cidx;
 	else if (pidx < cidx)
 		used = size - cidx + pidx;
 	else if (gen == 0 && pidx == cidx)
 		used = 0;
 	else if (gen == 1 && pidx == cidx)
 		used = size;
 	else
 		panic("bad state");
 
 	return (used);
 }
 
 #define TXQ_AVAIL(txq) (txq->ift_size - get_inuse(txq->ift_size, txq->ift_cidx, txq->ift_pidx, txq->ift_gen))
 
 #define IDXDIFF(head, tail, wrap) \
 	((head) >= (tail) ? (head) - (tail) : (wrap) - (tail) + (head))
 
 struct iflib_rxq {
 	/* If there is a separate completion queue -
 	 * these are the cq cidx and pidx. Otherwise
 	 * these are unused.
 	 */
 	uint16_t	ifr_size;
 	uint16_t	ifr_cq_cidx;
 	uint16_t	ifr_cq_pidx;
 	uint8_t		ifr_cq_gen;
 
 	if_ctx_t	ifr_ctx;
 	iflib_fl_t	ifr_fl;
 	uint64_t	ifr_rx_irq;
 	uint16_t	ifr_id;
 	uint8_t		ifr_lro_enabled;
 	uint8_t		ifr_nfl;
 	struct lro_ctrl			ifr_lc;
 	struct grouptask        ifr_task;
 	struct iflib_filter_info ifr_filter_info;
 	iflib_dma_info_t		ifr_ifdi;
 	/* dynamically allocate if any drivers need a value substantially larger than this */
 	struct if_rxd_frag	ifr_frags[IFLIB_MAX_RX_SEGS] __aligned(CACHE_LINE_SIZE);
 }  __aligned(CACHE_LINE_SIZE);
 
 /*
  * Only allow a single packet to take up most 1/nth of the tx ring
  */
 #define MAX_SINGLE_PACKET_FRACTION 12
 #define IF_BAD_DMA (bus_addr_t)-1
 
 static int enable_msix = 1;
 
 #define mtx_held(m)	(((m)->mtx_lock & ~MTX_FLAGMASK) != (uintptr_t)0)
 
 
 
 #define CTX_ACTIVE(ctx) ((if_getdrvflags((ctx)->ifc_ifp) & IFF_DRV_RUNNING))
 
 #define CTX_LOCK_INIT(_sc, _name)  mtx_init(&(_sc)->ifc_mtx, _name, "iflib ctx lock", MTX_DEF)
 
 #define CTX_LOCK(ctx) mtx_lock(&(ctx)->ifc_mtx)
 #define CTX_UNLOCK(ctx) mtx_unlock(&(ctx)->ifc_mtx)
 #define CTX_LOCK_DESTROY(ctx) mtx_destroy(&(ctx)->ifc_mtx)
 
 
 #define TXDB_LOCK_INIT(txq)  mtx_init(&(txq)->ift_db_mtx, (txq)->ift_db_mtx_name, NULL, MTX_DEF)
 #define TXDB_TRYLOCK(txq) mtx_trylock(&(txq)->ift_db_mtx)
 #define TXDB_LOCK(txq) mtx_lock(&(txq)->ift_db_mtx)
 #define TXDB_UNLOCK(txq) mtx_unlock(&(txq)->ift_db_mtx)
 #define TXDB_LOCK_DESTROY(txq) mtx_destroy(&(txq)->ift_db_mtx)
 
 #define CALLOUT_LOCK(txq)	mtx_lock(&txq->ift_mtx)
 #define CALLOUT_UNLOCK(txq) 	mtx_unlock(&txq->ift_mtx)
 
 
 /* Our boot-time initialization hook */
 static int	iflib_module_event_handler(module_t, int, void *);
 
 static moduledata_t iflib_moduledata = {
 	"iflib",
 	iflib_module_event_handler,
 	NULL
 };
 
 DECLARE_MODULE(iflib, iflib_moduledata, SI_SUB_INIT_IF, SI_ORDER_ANY);
 MODULE_VERSION(iflib, 1);
 
 MODULE_DEPEND(iflib, pci, 1, 1, 1);
 MODULE_DEPEND(iflib, ether, 1, 1, 1);
 
 TASKQGROUP_DEFINE(if_io_tqg, mp_ncpus, 1);
 TASKQGROUP_DEFINE(if_config_tqg, 1, 1);
 
 #ifndef IFLIB_DEBUG_COUNTERS
 #ifdef INVARIANTS
 #define IFLIB_DEBUG_COUNTERS 1
 #else
 #define IFLIB_DEBUG_COUNTERS 0
 #endif /* !INVARIANTS */
 #endif
 
 static SYSCTL_NODE(_net, OID_AUTO, iflib, CTLFLAG_RD, 0,
                    "iflib driver parameters");
 
 /*
  * XXX need to ensure that this can't accidentally cause the head to be moved backwards 
  */
 static int iflib_min_tx_latency = 0;
 
 SYSCTL_INT(_net_iflib, OID_AUTO, min_tx_latency, CTLFLAG_RW,
 		   &iflib_min_tx_latency, 0, "minimize transmit latency at the possibel expense of throughput");
 
 
 #if IFLIB_DEBUG_COUNTERS
 
 static int iflib_tx_seen;
 static int iflib_tx_sent;
 static int iflib_tx_encap;
 static int iflib_rx_allocs;
 static int iflib_fl_refills;
 static int iflib_fl_refills_large;
 static int iflib_tx_frees;
 
 SYSCTL_INT(_net_iflib, OID_AUTO, tx_seen, CTLFLAG_RD,
 		   &iflib_tx_seen, 0, "# tx mbufs seen");
 SYSCTL_INT(_net_iflib, OID_AUTO, tx_sent, CTLFLAG_RD,
 		   &iflib_tx_sent, 0, "# tx mbufs sent");
 SYSCTL_INT(_net_iflib, OID_AUTO, tx_encap, CTLFLAG_RD,
 		   &iflib_tx_encap, 0, "# tx mbufs encapped");
 SYSCTL_INT(_net_iflib, OID_AUTO, tx_frees, CTLFLAG_RD,
 		   &iflib_tx_frees, 0, "# tx frees");
 SYSCTL_INT(_net_iflib, OID_AUTO, rx_allocs, CTLFLAG_RD,
 		   &iflib_rx_allocs, 0, "# rx allocations");
 SYSCTL_INT(_net_iflib, OID_AUTO, fl_refills, CTLFLAG_RD,
 		   &iflib_fl_refills, 0, "# refills");
 SYSCTL_INT(_net_iflib, OID_AUTO, fl_refills_large, CTLFLAG_RD,
 		   &iflib_fl_refills_large, 0, "# large refills");
 
 
 static int iflib_txq_drain_flushing;
 static int iflib_txq_drain_oactive;
 static int iflib_txq_drain_notready;
 static int iflib_txq_drain_encapfail;
 
 SYSCTL_INT(_net_iflib, OID_AUTO, txq_drain_flushing, CTLFLAG_RD,
 		   &iflib_txq_drain_flushing, 0, "# drain flushes");
 SYSCTL_INT(_net_iflib, OID_AUTO, txq_drain_oactive, CTLFLAG_RD,
 		   &iflib_txq_drain_oactive, 0, "# drain oactives");
 SYSCTL_INT(_net_iflib, OID_AUTO, txq_drain_notready, CTLFLAG_RD,
 		   &iflib_txq_drain_notready, 0, "# drain notready");
 SYSCTL_INT(_net_iflib, OID_AUTO, txq_drain_encapfail, CTLFLAG_RD,
 		   &iflib_txq_drain_encapfail, 0, "# drain encap fails");
 
 
 static int iflib_encap_load_mbuf_fail;
 static int iflib_encap_txq_avail_fail;
 static int iflib_encap_txd_encap_fail;
 
 SYSCTL_INT(_net_iflib, OID_AUTO, encap_load_mbuf_fail, CTLFLAG_RD,
 		   &iflib_encap_load_mbuf_fail, 0, "# busdma load failures");
 SYSCTL_INT(_net_iflib, OID_AUTO, encap_txq_avail_fail, CTLFLAG_RD,
 		   &iflib_encap_txq_avail_fail, 0, "# txq avail failures");
 SYSCTL_INT(_net_iflib, OID_AUTO, encap_txd_encap_fail, CTLFLAG_RD,
 		   &iflib_encap_txd_encap_fail, 0, "# driver encap failures");
 
 static int iflib_task_fn_rxs;
 static int iflib_rx_intr_enables;
 static int iflib_fast_intrs;
 static int iflib_intr_link;
 static int iflib_intr_msix; 
 static int iflib_rx_unavail;
 static int iflib_rx_ctx_inactive;
 static int iflib_rx_zero_len;
 static int iflib_rx_if_input;
 static int iflib_rx_mbuf_null;
 static int iflib_rxd_flush;
 
 static int iflib_verbose_debug;
 
 SYSCTL_INT(_net_iflib, OID_AUTO, intr_link, CTLFLAG_RD,
 		   &iflib_intr_link, 0, "# intr link calls");
 SYSCTL_INT(_net_iflib, OID_AUTO, intr_msix, CTLFLAG_RD,
 		   &iflib_intr_msix, 0, "# intr msix calls");
 SYSCTL_INT(_net_iflib, OID_AUTO, task_fn_rx, CTLFLAG_RD,
 		   &iflib_task_fn_rxs, 0, "# task_fn_rx calls");
 SYSCTL_INT(_net_iflib, OID_AUTO, rx_intr_enables, CTLFLAG_RD,
 		   &iflib_rx_intr_enables, 0, "# rx intr enables");
 SYSCTL_INT(_net_iflib, OID_AUTO, fast_intrs, CTLFLAG_RD,
 		   &iflib_fast_intrs, 0, "# fast_intr calls");
 SYSCTL_INT(_net_iflib, OID_AUTO, rx_unavail, CTLFLAG_RD,
 		   &iflib_rx_unavail, 0, "# times rxeof called with no available data");
 SYSCTL_INT(_net_iflib, OID_AUTO, rx_ctx_inactive, CTLFLAG_RD,
 		   &iflib_rx_ctx_inactive, 0, "# times rxeof called with inactive context");
 SYSCTL_INT(_net_iflib, OID_AUTO, rx_zero_len, CTLFLAG_RD,
 		   &iflib_rx_zero_len, 0, "# times rxeof saw zero len mbuf");
 SYSCTL_INT(_net_iflib, OID_AUTO, rx_if_input, CTLFLAG_RD,
 		   &iflib_rx_if_input, 0, "# times rxeof called if_input");
 SYSCTL_INT(_net_iflib, OID_AUTO, rx_mbuf_null, CTLFLAG_RD,
 		   &iflib_rx_mbuf_null, 0, "# times rxeof got null mbuf");
 SYSCTL_INT(_net_iflib, OID_AUTO, rxd_flush, CTLFLAG_RD,
 	         &iflib_rxd_flush, 0, "# times rxd_flush called");
 SYSCTL_INT(_net_iflib, OID_AUTO, verbose_debug, CTLFLAG_RW,
 		   &iflib_verbose_debug, 0, "enable verbose debugging");
 
 #define DBG_COUNTER_INC(name) atomic_add_int(&(iflib_ ## name), 1)
 
 #else
 #define DBG_COUNTER_INC(name)
 
 #endif
 
 
 
 #define IFLIB_DEBUG 0
 
 static void iflib_tx_structures_free(if_ctx_t ctx);
 static void iflib_rx_structures_free(if_ctx_t ctx);
 static int iflib_queues_alloc(if_ctx_t ctx);
 static int iflib_tx_credits_update(if_ctx_t ctx, iflib_txq_t txq);
 static int iflib_rxd_avail(if_ctx_t ctx, iflib_rxq_t rxq, int cidx);
 static int iflib_qset_structures_setup(if_ctx_t ctx);
 static int iflib_msix_init(if_ctx_t ctx);
 static int iflib_legacy_setup(if_ctx_t ctx, driver_filter_t filter, void *filterarg, int *rid, char *str);
 static void iflib_txq_check_drain(iflib_txq_t txq, int budget);
 static uint32_t iflib_txq_can_drain(struct ifmp_ring *);
 static int iflib_register(if_ctx_t);
 static void iflib_init_locked(if_ctx_t ctx);
 static void iflib_add_device_sysctl_pre(if_ctx_t ctx);
 static void iflib_add_device_sysctl_post(if_ctx_t ctx);
 
 
 #ifdef DEV_NETMAP
 #include <sys/selinfo.h>
 #include <net/netmap.h>
 #include <dev/netmap/netmap_kern.h>
 
 MODULE_DEPEND(iflib, netmap, 1, 1, 1);
 
 /*
  * device-specific sysctl variables:
  *
  * iflib_crcstrip: 0: keep CRC in rx frames (default), 1: strip it.
  *	During regular operations the CRC is stripped, but on some
  *	hardware reception of frames not multiple of 64 is slower,
  *	so using crcstrip=0 helps in benchmarks.
  *
  * iflib_rx_miss, iflib_rx_miss_bufs:
  *	count packets that might be missed due to lost interrupts.
  */
 SYSCTL_DECL(_dev_netmap);
 /*
  * The xl driver by default strips CRCs and we do not override it.
  */
 
 int iflib_crcstrip = 1;
 SYSCTL_INT(_dev_netmap, OID_AUTO, iflib_crcstrip,
     CTLFLAG_RW, &iflib_crcstrip, 1, "strip CRC on rx frames");
 
 int iflib_rx_miss, iflib_rx_miss_bufs;
 SYSCTL_INT(_dev_netmap, OID_AUTO, iflib_rx_miss,
     CTLFLAG_RW, &iflib_rx_miss, 0, "potentially missed rx intr");
 SYSCTL_INT(_dev_netmap, OID_AUTO, iflib_rx_miss_bufs,
     CTLFLAG_RW, &iflib_rx_miss_bufs, 0, "potentially missed rx intr bufs");
 
 /*
  * Register/unregister. We are already under netmap lock.
  * Only called on the first register or the last unregister.
  */
 static int
 iflib_netmap_register(struct netmap_adapter *na, int onoff)
 {
 	struct ifnet *ifp = na->ifp;
 	if_ctx_t ctx = ifp->if_softc;
 
 	CTX_LOCK(ctx);
 	IFDI_INTR_DISABLE(ctx);
 
 	/* Tell the stack that the interface is no longer active */
 	ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE);
 
 	if (!CTX_IS_VF(ctx))
 		IFDI_CRCSTRIP_SET(ctx, onoff);
 
 	/* enable or disable flags and callbacks in na and ifp */
 	if (onoff) {
 		nm_set_native_flags(na);
 	} else {
 		nm_clear_native_flags(na);
 	}
 	IFDI_INIT(ctx);
 	IFDI_CRCSTRIP_SET(ctx, onoff); // XXX why twice ?
 	CTX_UNLOCK(ctx);
 	return (ifp->if_drv_flags & IFF_DRV_RUNNING ? 0 : 1);
 }
 
 /*
  * Reconcile kernel and user view of the transmit ring.
  *
  * All information is in the kring.
  * Userspace wants to send packets up to the one before kring->rhead,
  * kernel knows kring->nr_hwcur is the first unsent packet.
  *
  * Here we push packets out (as many as possible), and possibly
  * reclaim buffers from previously completed transmission.
  *
  * The caller (netmap) guarantees that there is only one instance
  * running at any time. Any interference with other driver
  * methods should be handled by the individual drivers.
  */
 static int
 iflib_netmap_txsync(struct netmap_kring *kring, int flags)
 {
 	struct netmap_adapter *na = kring->na;
 	struct ifnet *ifp = na->ifp;
 	struct netmap_ring *ring = kring->ring;
 	u_int nm_i;	/* index into the netmap ring */
 	u_int nic_i;	/* index into the NIC ring */
 	u_int n;
 	u_int const lim = kring->nkr_num_slots - 1;
 	u_int const head = kring->rhead;
 	struct if_pkt_info pi;
 
 	/*
 	 * interrupts on every tx packet are expensive so request
 	 * them every half ring, or where NS_REPORT is set
 	 */
 	u_int report_frequency = kring->nkr_num_slots >> 1;
 	/* device-specific */
 	if_ctx_t ctx = ifp->if_softc;
 	iflib_txq_t txq = &ctx->ifc_txqs[kring->ring_id];
 
 	pi.ipi_segs = txq->ift_segs;
 	pi.ipi_qsidx = kring->ring_id;
 	pi.ipi_ndescs = 0;
 
 	bus_dmamap_sync(txq->ift_desc_tag, txq->ift_ifdi->idi_map,
 					BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 
 
 	/*
 	 * First part: process new packets to send.
 	 * nm_i is the current index in the netmap ring,
 	 * nic_i is the corresponding index in the NIC ring.
 	 *
 	 * If we have packets to send (nm_i != head)
 	 * iterate over the netmap ring, fetch length and update
 	 * the corresponding slot in the NIC ring. Some drivers also
 	 * need to update the buffer's physical address in the NIC slot
 	 * even NS_BUF_CHANGED is not set (PNMB computes the addresses).
 	 *
 	 * The netmap_reload_map() calls is especially expensive,
 	 * even when (as in this case) the tag is 0, so do only
 	 * when the buffer has actually changed.
 	 *
 	 * If possible do not set the report/intr bit on all slots,
 	 * but only a few times per ring or when NS_REPORT is set.
 	 *
 	 * Finally, on 10G and faster drivers, it might be useful
 	 * to prefetch the next slot and txr entry.
 	 */
 
 	nm_i = kring->nr_hwcur;
 	if (nm_i != head) {	/* we have new packets to send */
 		nic_i = netmap_idx_k2n(kring, nm_i);
 
 		__builtin_prefetch(&ring->slot[nm_i]);
 		__builtin_prefetch(&txq->ift_sds.ifsd_m[nic_i]);
 		__builtin_prefetch(&txq->ift_sds.ifsd_map[nic_i]);
 
 		for (n = 0; nm_i != head; n++) {
 			struct netmap_slot *slot = &ring->slot[nm_i];
 			u_int len = slot->len;
 			uint64_t paddr;
 			void *addr = PNMB(na, slot, &paddr);
 			int flags = (slot->flags & NS_REPORT ||
 				nic_i == 0 || nic_i == report_frequency) ?
 				IPI_TX_INTR : 0;
 
 			/* device-specific */
 			pi.ipi_pidx = nic_i;
 			pi.ipi_flags = flags;
 
 			/* Fill the slot in the NIC ring. */
 			ctx->isc_txd_encap(ctx->ifc_softc, &pi);
 
 			/* prefetch for next round */
 			__builtin_prefetch(&ring->slot[nm_i + 1]);
 			__builtin_prefetch(&txq->ift_sds.ifsd_m[nic_i + 1]);
 			__builtin_prefetch(&txq->ift_sds.ifsd_map[nic_i + 1]);
 
 			NM_CHECK_ADDR_LEN(na, addr, len);
 
 			if (slot->flags & NS_BUF_CHANGED) {
 				/* buffer has changed, reload map */
 				netmap_reload_map(na, txq->ift_desc_tag, txq->ift_sds.ifsd_map[nic_i], addr);
 			}
 			slot->flags &= ~(NS_REPORT | NS_BUF_CHANGED);
 
 			/* make sure changes to the buffer are synced */
 			bus_dmamap_sync(txq->ift_ifdi->idi_tag, txq->ift_sds.ifsd_map[nic_i],
 							BUS_DMASYNC_PREWRITE);
 
 			nm_i = nm_next(nm_i, lim);
 			nic_i = nm_next(nic_i, lim);
 		}
 		kring->nr_hwcur = head;
 
 		/* synchronize the NIC ring */
 		bus_dmamap_sync(txq->ift_desc_tag, txq->ift_ifdi->idi_map,
 						BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 		/* (re)start the tx unit up to slot nic_i (excluded) */
 		ctx->isc_txd_flush(ctx->ifc_softc, txq->ift_id, nic_i);
 	}
 
 	/*
 	 * Second part: reclaim buffers for completed transmissions.
 	 */
 	if (iflib_tx_credits_update(ctx, txq)) {
 		/* some tx completed, increment avail */
 		nic_i = txq->ift_cidx_processed;
 		kring->nr_hwtail = nm_prev(netmap_idx_n2k(kring, nic_i), lim);
 	}
 	return (0);
 }
 
 /*
  * Reconcile kernel and user view of the receive ring.
  * Same as for the txsync, this routine must be efficient.
  * The caller guarantees a single invocations, but races against
  * the rest of the driver should be handled here.
  *
  * On call, kring->rhead is the first packet that userspace wants
  * to keep, and kring->rcur is the wakeup point.
  * The kernel has previously reported packets up to kring->rtail.
  *
  * If (flags & NAF_FORCE_READ) also check for incoming packets irrespective
  * of whether or not we received an interrupt.
  */
 static int
 iflib_netmap_rxsync(struct netmap_kring *kring, int flags)
 {
 	struct netmap_adapter *na = kring->na;
 	struct ifnet *ifp = na->ifp;
 	struct netmap_ring *ring = kring->ring;
 	u_int nm_i;	/* index into the netmap ring */
 	u_int nic_i;	/* index into the NIC ring */
 	u_int i, n;
 	u_int const lim = kring->nkr_num_slots - 1;
 	u_int const head = kring->rhead;
 	int force_update = (flags & NAF_FORCE_READ) || kring->nr_kflags & NKR_PENDINTR;
 	struct if_rxd_info ri;
 	/* device-specific */
 	if_ctx_t ctx = ifp->if_softc;
 	iflib_rxq_t rxq = &ctx->ifc_rxqs[kring->ring_id];
 	iflib_fl_t fl = rxq->ifr_fl;
 	if (head > lim)
 		return netmap_ring_reinit(kring);
 
 	bzero(&ri, sizeof(ri));
 	ri.iri_qsidx = kring->ring_id;
 	ri.iri_ifp = ctx->ifc_ifp;
 	/* XXX check sync modes */
 	for (i = 0, fl = rxq->ifr_fl; i < rxq->ifr_nfl; i++, fl++)
 		bus_dmamap_sync(rxq->ifr_fl[i].ifl_desc_tag, fl->ifl_ifdi->idi_map,
 				BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 
 	/*
 	 * First part: import newly received packets.
 	 *
 	 * nm_i is the index of the next free slot in the netmap ring,
 	 * nic_i is the index of the next received packet in the NIC ring,
 	 * and they may differ in case if_init() has been called while
 	 * in netmap mode. For the receive ring we have
 	 *
 	 *	nic_i = rxr->next_check;
 	 *	nm_i = kring->nr_hwtail (previous)
 	 * and
 	 *	nm_i == (nic_i + kring->nkr_hwofs) % ring_size
 	 *
 	 * rxr->next_check is set to 0 on a ring reinit
 	 */
 	if (netmap_no_pendintr || force_update) {
 		int crclen = iflib_crcstrip ? 0 : 4;
 		int error, avail;
 		uint16_t slot_flags = kring->nkr_slot_flags;
 
 		for (fl = rxq->ifr_fl, i = 0; i < rxq->ifr_nfl; i++, fl++) {
 			nic_i = fl->ifl_cidx;
 			nm_i = netmap_idx_n2k(kring, nic_i);
 			avail = ctx->isc_rxd_available(ctx->ifc_softc, kring->ring_id, nic_i);
 			for (n = 0; avail > 0; n++, avail--) {
 				error = ctx->isc_rxd_pkt_get(ctx->ifc_softc, &ri);
 				if (error)
 					ring->slot[nm_i].len = 0;
 				else
 					ring->slot[nm_i].len = ri.iri_len - crclen;
 				ring->slot[nm_i].flags = slot_flags;
 				bus_dmamap_sync(fl->ifl_ifdi->idi_tag,
 								fl->ifl_sds[nic_i].ifsd_map, BUS_DMASYNC_POSTREAD);
 				nm_i = nm_next(nm_i, lim);
 				nic_i = nm_next(nic_i, lim);
 			}
 			if (n) { /* update the state variables */
 				if (netmap_no_pendintr && !force_update) {
 					/* diagnostics */
 					iflib_rx_miss ++;
 					iflib_rx_miss_bufs += n;
 				}
 				fl->ifl_cidx = nic_i;
 				kring->nr_hwtail = nm_i;
 			}
 			kring->nr_kflags &= ~NKR_PENDINTR;
 		}
 	}
 	/*
 	 * Second part: skip past packets that userspace has released.
 	 * (kring->nr_hwcur to head excluded),
 	 * and make the buffers available for reception.
 	 * As usual nm_i is the index in the netmap ring,
 	 * nic_i is the index in the NIC ring, and
 	 * nm_i == (nic_i + kring->nkr_hwofs) % ring_size
 	 */
 	/* XXX not sure how this will work with multiple free lists */
 	nm_i = kring->nr_hwcur;
 	if (nm_i != head) {
 		nic_i = netmap_idx_k2n(kring, nm_i);
 		for (n = 0; nm_i != head; n++) {
 			struct netmap_slot *slot = &ring->slot[nm_i];
 			uint64_t paddr;
 			caddr_t vaddr;
 			void *addr = PNMB(na, slot, &paddr);
 
 			if (addr == NETMAP_BUF_BASE(na)) /* bad buf */
 				goto ring_reset;
 
 			vaddr = addr;
 			if (slot->flags & NS_BUF_CHANGED) {
 				/* buffer has changed, reload map */
 				netmap_reload_map(na, fl->ifl_ifdi->idi_tag, fl->ifl_sds[nic_i].ifsd_map, addr);
 				slot->flags &= ~NS_BUF_CHANGED;
 			}
 			/*
 			 * XXX we should be batching this operation - TODO
 			 */
 			ctx->isc_rxd_refill(ctx->ifc_softc, rxq->ifr_id, fl->ifl_id, nic_i, &paddr, &vaddr, 1);
 			bus_dmamap_sync(fl->ifl_ifdi->idi_tag, fl->ifl_sds[nic_i].ifsd_map,
 			    BUS_DMASYNC_PREREAD);
 			nm_i = nm_next(nm_i, lim);
 			nic_i = nm_next(nic_i, lim);
 		}
 		kring->nr_hwcur = head;
 
 		bus_dmamap_sync(fl->ifl_ifdi->idi_tag, fl->ifl_ifdi->idi_map,
 		    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 		/*
 		 * IMPORTANT: we must leave one free slot in the ring,
 		 * so move nic_i back by one unit
 		 */
 		nic_i = nm_prev(nic_i, lim);
 		ctx->isc_rxd_flush(ctx->ifc_softc, rxq->ifr_id, fl->ifl_id, nic_i);
 	}
 
 	return 0;
 
 ring_reset:
 	return netmap_ring_reinit(kring);
 }
 
 static int
 iflib_netmap_attach(if_ctx_t ctx)
 {
 	struct netmap_adapter na;
 
 	bzero(&na, sizeof(na));
 
 	na.ifp = ctx->ifc_ifp;
 	na.na_flags = NAF_BDG_MAYSLEEP;
 	MPASS(ctx->ifc_softc_ctx.isc_ntxqsets);
 	MPASS(ctx->ifc_softc_ctx.isc_nrxqsets);
 
 	na.num_tx_desc = ctx->ifc_sctx->isc_ntxd;
 	na.num_rx_desc = ctx->ifc_sctx->isc_ntxd;
 	na.nm_txsync = iflib_netmap_txsync;
 	na.nm_rxsync = iflib_netmap_rxsync;
 	na.nm_register = iflib_netmap_register;
 	na.num_tx_rings = ctx->ifc_softc_ctx.isc_ntxqsets;
 	na.num_rx_rings = ctx->ifc_softc_ctx.isc_nrxqsets;
 	return (netmap_attach(&na));
 }
 
 static void
 iflib_netmap_txq_init(if_ctx_t ctx, iflib_txq_t txq)
 {
 	struct netmap_adapter *na = NA(ctx->ifc_ifp);
 	struct netmap_slot *slot;
 
 	slot = netmap_reset(na, NR_TX, txq->ift_id, 0);
 	if (slot == 0)
 		return;
 
 	for (int i = 0; i < ctx->ifc_sctx->isc_ntxd; i++) {
 
 		/*
 		 * In netmap mode, set the map for the packet buffer.
 		 * NOTE: Some drivers (not this one) also need to set
 		 * the physical buffer address in the NIC ring.
 		 * netmap_idx_n2k() maps a nic index, i, into the corresponding
 		 * netmap slot index, si
 		 */
 		int si = netmap_idx_n2k(&na->tx_rings[txq->ift_id], i);
 		netmap_load_map(na, txq->ift_desc_tag, txq->ift_sds.ifsd_map[i], NMB(na, slot + si));
 	}
 }
 static void
 iflib_netmap_rxq_init(if_ctx_t ctx, iflib_rxq_t rxq)
 {
 	struct netmap_adapter *na = NA(ctx->ifc_ifp);
 	struct netmap_slot *slot;
 	iflib_rxsd_t sd;
 	int nrxd;
 
 	slot = netmap_reset(na, NR_RX, rxq->ifr_id, 0);
 	if (slot == 0)
 		return;
 	sd = rxq->ifr_fl[0].ifl_sds;
 	nrxd = ctx->ifc_sctx->isc_nrxd;
 	for (int i = 0; i < nrxd; i++, sd++) {
 			int sj = netmap_idx_n2k(&na->rx_rings[rxq->ifr_id], i);
 			uint64_t paddr;
 			void *addr;
 			caddr_t vaddr;
 
 			vaddr = addr = PNMB(na, slot + sj, &paddr);
 			netmap_load_map(na, rxq->ifr_fl[0].ifl_ifdi->idi_tag, sd->ifsd_map, addr);
 			/* Update descriptor and the cached value */
 			ctx->isc_rxd_refill(ctx->ifc_softc, rxq->ifr_id, 0 /* fl_id */, i, &paddr, &vaddr, 1);
 	}
 	/* preserve queue */
 	if (ctx->ifc_ifp->if_capenable & IFCAP_NETMAP) {
 		struct netmap_kring *kring = &na->rx_rings[rxq->ifr_id];
 		int t = na->num_rx_desc - 1 - nm_kr_rxspace(kring);
 		ctx->isc_rxd_flush(ctx->ifc_softc, rxq->ifr_id, 0 /* fl_id */, t);
 	} else
 		ctx->isc_rxd_flush(ctx->ifc_softc, rxq->ifr_id, 0 /* fl_id */, nrxd-1);
 }
 
 #define iflib_netmap_detach(ifp) netmap_detach(ifp)
 
 #else
 #define iflib_netmap_txq_init(ctx, txq)
 #define iflib_netmap_rxq_init(ctx, rxq)
 #define iflib_netmap_detach(ifp)
 
 #define iflib_netmap_attach(ctx) (0)
 #define netmap_rx_irq(ifp, qid, budget) (0)
 
 #endif
 
 #if defined(__i386__) || defined(__amd64__)
 static __inline void
 prefetch(void *x)
 {
 	__asm volatile("prefetcht0 %0" :: "m" (*(unsigned long *)x));
 }
 #else
 #define prefetch(x)
 #endif
 
 static void
 _iflib_dmamap_cb(void *arg, bus_dma_segment_t *segs, int nseg, int err)
 {
 	if (err)
 		return;
 	*(bus_addr_t *) arg = segs[0].ds_addr;
 }
 
 int
 iflib_dma_alloc(if_ctx_t ctx, int size, iflib_dma_info_t dma, int mapflags)
 {
 	int err;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	device_t dev = ctx->ifc_dev;
 
 	KASSERT(sctx->isc_q_align != 0, ("alignment value not initialized"));
 
 	err = bus_dma_tag_create(bus_get_dma_tag(dev), /* parent */
 				sctx->isc_q_align, 0,	/* alignment, bounds */
 				BUS_SPACE_MAXADDR,	/* lowaddr */
 				BUS_SPACE_MAXADDR,	/* highaddr */
 				NULL, NULL,		/* filter, filterarg */
 				size,			/* maxsize */
 				1,			/* nsegments */
 				size,			/* maxsegsize */
 				BUS_DMA_ALLOCNOW,	/* flags */
 				NULL,			/* lockfunc */
 				NULL,			/* lockarg */
 				&dma->idi_tag);
 	if (err) {
 		device_printf(dev,
 		    "%s: bus_dma_tag_create failed: %d\n",
 		    __func__, err);
 		goto fail_0;
 	}
 
 	err = bus_dmamem_alloc(dma->idi_tag, (void**) &dma->idi_vaddr,
 	    BUS_DMA_NOWAIT | BUS_DMA_COHERENT | BUS_DMA_ZERO, &dma->idi_map);
 	if (err) {
 		device_printf(dev,
 		    "%s: bus_dmamem_alloc(%ju) failed: %d\n",
 		    __func__, (uintmax_t)size, err);
 		goto fail_1;
 	}
 
 	dma->idi_paddr = IF_BAD_DMA;
 	err = bus_dmamap_load(dma->idi_tag, dma->idi_map, dma->idi_vaddr,
 	    size, _iflib_dmamap_cb, &dma->idi_paddr, mapflags | BUS_DMA_NOWAIT);
 	if (err || dma->idi_paddr == IF_BAD_DMA) {
 		device_printf(dev,
 		    "%s: bus_dmamap_load failed: %d\n",
 		    __func__, err);
 		goto fail_2;
 	}
 
 	dma->idi_size = size;
 	return (0);
 
 fail_2:
 	bus_dmamem_free(dma->idi_tag, dma->idi_vaddr, dma->idi_map);
 fail_1:
 	bus_dma_tag_destroy(dma->idi_tag);
 fail_0:
 	dma->idi_tag = NULL;
 
 	return (err);
 }
 
 int
 iflib_dma_alloc_multi(if_ctx_t ctx, int *sizes, iflib_dma_info_t *dmalist, int mapflags, int count)
 {
 	int i, err;
 	iflib_dma_info_t *dmaiter;
 
 	dmaiter = dmalist;
 	for (i = 0; i < count; i++, dmaiter++) {
 		if ((err = iflib_dma_alloc(ctx, sizes[i], *dmaiter, mapflags)) != 0)
 			break;
 	}
 	if (err)
 		iflib_dma_free_multi(dmalist, i);
 	return (err);
 }
 
 void
 iflib_dma_free(iflib_dma_info_t dma)
 {
 	if (dma->idi_tag == NULL)
 		return;
 	if (dma->idi_paddr != IF_BAD_DMA) {
 		bus_dmamap_sync(dma->idi_tag, dma->idi_map,
 		    BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 		bus_dmamap_unload(dma->idi_tag, dma->idi_map);
 		dma->idi_paddr = IF_BAD_DMA;
 	}
 	if (dma->idi_vaddr != NULL) {
 		bus_dmamem_free(dma->idi_tag, dma->idi_vaddr, dma->idi_map);
 		dma->idi_vaddr = NULL;
 	}
 	bus_dma_tag_destroy(dma->idi_tag);
 	dma->idi_tag = NULL;
 }
 
 void
 iflib_dma_free_multi(iflib_dma_info_t *dmalist, int count)
 {
 	int i;
 	iflib_dma_info_t *dmaiter = dmalist;
 
 	for (i = 0; i < count; i++, dmaiter++)
 		iflib_dma_free(*dmaiter);
 }
 
 static int
 iflib_fast_intr(void *arg)
 {
 	iflib_filter_info_t info = arg;
 	struct grouptask *gtask = info->ifi_task;
 
 	DBG_COUNTER_INC(fast_intrs);
 	if (info->ifi_filter != NULL && info->ifi_filter(info->ifi_filter_arg) == FILTER_HANDLED)
 		return (FILTER_HANDLED);
 
 	GROUPTASK_ENQUEUE(gtask);
 	return (FILTER_HANDLED);
 }
 
 static int
 _iflib_irq_alloc(if_ctx_t ctx, if_irq_t irq, int rid,
 	driver_filter_t filter, driver_intr_t handler, void *arg,
 				 char *name)
 {
 	int rc;
 	struct resource *res;
 	void *tag;
 	device_t dev = ctx->ifc_dev;
 
 	MPASS(rid < 512);
 	irq->ii_rid = rid;
 	res = bus_alloc_resource_any(dev, SYS_RES_IRQ, &irq->ii_rid,
 				     RF_SHAREABLE | RF_ACTIVE);
 	if (res == NULL) {
 		device_printf(dev,
 		    "failed to allocate IRQ for rid %d, name %s.\n", rid, name);
 		return (ENOMEM);
 	}
 	irq->ii_res = res;
 	KASSERT(filter == NULL || handler == NULL, ("filter and handler can't both be non-NULL"));
 	rc = bus_setup_intr(dev, res, INTR_MPSAFE | INTR_TYPE_NET,
 						filter, handler, arg, &tag);
 	if (rc != 0) {
 		device_printf(dev,
 		    "failed to setup interrupt for rid %d, name %s: %d\n",
 					  rid, name ? name : "unknown", rc);
 		return (rc);
 	} else if (name)
-		bus_describe_intr(dev, res, tag, name);
+		bus_describe_intr(dev, res, tag, "%s", name);
 
 	irq->ii_tag = tag;
 	return (0);
 }
 
 
 /*********************************************************************
  *
  *  Allocate memory for tx_buffer structures. The tx_buffer stores all
  *  the information needed to transmit a packet on the wire. This is
  *  called only once at attach, setup is done every reset.
  *
  **********************************************************************/
 
 static int
 iflib_txsd_alloc(iflib_txq_t txq)
 {
 	if_ctx_t ctx = txq->ift_ctx;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 	device_t dev = ctx->ifc_dev;
 	int err, nsegments, ntsosegments;
 
 	nsegments = scctx->isc_tx_nsegments;
 	ntsosegments = scctx->isc_tx_tso_segments_max;
 	MPASS(sctx->isc_ntxd > 0);
 	MPASS(nsegments > 0);
 	MPASS(ntsosegments > 0);
 	/*
 	 * Setup DMA descriptor areas.
 	 */
 	if ((err = bus_dma_tag_create(bus_get_dma_tag(dev),
 			       1, 0,			/* alignment, bounds */
 			       BUS_SPACE_MAXADDR,	/* lowaddr */
 			       BUS_SPACE_MAXADDR,	/* highaddr */
 			       NULL, NULL,		/* filter, filterarg */
 			       sctx->isc_tx_maxsize,		/* maxsize */
 			       nsegments,	/* nsegments */
 			       sctx->isc_tx_maxsegsize,	/* maxsegsize */
 			       0,			/* flags */
 			       NULL,			/* lockfunc */
 			       NULL,			/* lockfuncarg */
 			       &txq->ift_desc_tag))) {
 		device_printf(dev,"Unable to allocate TX DMA tag: %d\n", err);
 		device_printf(dev,"maxsize: %zd nsegments: %d maxsegsize: %zd\n",
 					  sctx->isc_tx_maxsize, nsegments, sctx->isc_tx_maxsegsize);
 		goto fail;
 	}
 #ifdef INVARIANTS
 	device_printf(dev,"maxsize: %zd nsegments: %d maxsegsize: %zd\n",
 		      sctx->isc_tx_maxsize, nsegments, sctx->isc_tx_maxsegsize);
 #endif
 	device_printf(dev,"TSO maxsize: %d ntsosegments: %d maxsegsize: %d\n",
 		      scctx->isc_tx_tso_size_max, ntsosegments,
 		      scctx->isc_tx_tso_segsize_max);
 	if ((err = bus_dma_tag_create(bus_get_dma_tag(dev),
 			       1, 0,			/* alignment, bounds */
 			       BUS_SPACE_MAXADDR,	/* lowaddr */
 			       BUS_SPACE_MAXADDR,	/* highaddr */
 			       NULL, NULL,		/* filter, filterarg */
 			       scctx->isc_tx_tso_size_max,		/* maxsize */
 			       ntsosegments,	/* nsegments */
 			       scctx->isc_tx_tso_segsize_max,	/* maxsegsize */
 			       0,			/* flags */
 			       NULL,			/* lockfunc */
 			       NULL,			/* lockfuncarg */
 			       &txq->ift_tso_desc_tag))) {
 		device_printf(dev,"Unable to allocate TX TSO DMA tag: %d\n", err);
 
 		goto fail;
 	}
 #ifdef INVARIANTS
 	device_printf(dev,"TSO maxsize: %d ntsosegments: %d maxsegsize: %d\n",
 		      scctx->isc_tx_tso_size_max, ntsosegments,
 		      scctx->isc_tx_tso_segsize_max);
 #endif
 	if (!(txq->ift_sds.ifsd_flags =
 	    (uint8_t *) malloc(sizeof(uint8_t) *
 	    sctx->isc_ntxd, M_IFLIB, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate tx_buffer memory\n");
 		err = ENOMEM;
 		goto fail;
 	}
 	if (!(txq->ift_sds.ifsd_m =
 	    (struct mbuf **) malloc(sizeof(struct mbuf *) *
 	    sctx->isc_ntxd, M_IFLIB, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate tx_buffer memory\n");
 		err = ENOMEM;
 		goto fail;
 	}
 
         /* Create the descriptor buffer dma maps */
 #if defined(ACPI_DMAR) || (!(defined(__i386__) && !defined(__amd64__)))
 	if ((ctx->ifc_flags & IFC_DMAR) == 0)
 		return (0);
 
 	if (!(txq->ift_sds.ifsd_map =
 	    (bus_dmamap_t *) malloc(sizeof(bus_dmamap_t) * sctx->isc_ntxd, M_IFLIB, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate tx_buffer map memory\n");
 		err = ENOMEM;
 		goto fail;
 	}
 
 	for (int i = 0; i < sctx->isc_ntxd; i++) {
 		err = bus_dmamap_create(txq->ift_desc_tag, 0, &txq->ift_sds.ifsd_map[i]);
 		if (err != 0) {
 			device_printf(dev, "Unable to create TX DMA map\n");
 			goto fail;
 		}
 	}
 #endif
 	return (0);
 fail:
 	/* We free all, it handles case where we are in the middle */
 	iflib_tx_structures_free(ctx);
 	return (err);
 }
 
 static void
 iflib_txsd_destroy(if_ctx_t ctx, iflib_txq_t txq, int i)
 {
 	bus_dmamap_t map;
 
 	map = NULL;
 	if (txq->ift_sds.ifsd_map != NULL)
 		map = txq->ift_sds.ifsd_map[i];
 	if (map != NULL) {
 		bus_dmamap_unload(txq->ift_desc_tag, map);
 		bus_dmamap_destroy(txq->ift_desc_tag, map);
 		txq->ift_sds.ifsd_map[i] = NULL;
 	}
 }
 
 static void
 iflib_txq_destroy(iflib_txq_t txq)
 {
 	if_ctx_t ctx = txq->ift_ctx;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 
 	for (int i = 0; i < sctx->isc_ntxd; i++)
 		iflib_txsd_destroy(ctx, txq, i);
 	if (txq->ift_sds.ifsd_map != NULL) {
 		free(txq->ift_sds.ifsd_map, M_IFLIB);
 		txq->ift_sds.ifsd_map = NULL;
 	}
 	if (txq->ift_sds.ifsd_m != NULL) {
 		free(txq->ift_sds.ifsd_m, M_IFLIB);
 		txq->ift_sds.ifsd_m = NULL;
 	}
 	if (txq->ift_sds.ifsd_flags != NULL) {
 		free(txq->ift_sds.ifsd_flags, M_IFLIB);
 		txq->ift_sds.ifsd_flags = NULL;
 	}
 	if (txq->ift_desc_tag != NULL) {
 		bus_dma_tag_destroy(txq->ift_desc_tag);
 		txq->ift_desc_tag = NULL;
 	}
 	if (txq->ift_tso_desc_tag != NULL) {
 		bus_dma_tag_destroy(txq->ift_tso_desc_tag);
 		txq->ift_tso_desc_tag = NULL;
 	}
 }
 
 static void
 iflib_txsd_free(if_ctx_t ctx, iflib_txq_t txq, int i)
 {
 	struct mbuf **mp;
 
 	mp = &txq->ift_sds.ifsd_m[i];
 	if (*mp == NULL)
 		return;
 
 	if (txq->ift_sds.ifsd_map != NULL) {
 		bus_dmamap_sync(txq->ift_desc_tag,
 				txq->ift_sds.ifsd_map[i],
 				BUS_DMASYNC_POSTWRITE);
 		bus_dmamap_unload(txq->ift_desc_tag,
 				  txq->ift_sds.ifsd_map[i]);
 	}
 	m_freem(*mp);
 	DBG_COUNTER_INC(tx_frees);
 	*mp = NULL;
 }
 
 static int
 iflib_txq_setup(iflib_txq_t txq)
 {
 	if_ctx_t ctx = txq->ift_ctx;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	iflib_dma_info_t di;
 	int i;
 
     /* Set number of descriptors available */
 	txq->ift_qstatus = IFLIB_QUEUE_IDLE;
 
 	/* Reset indices */
 	txq->ift_cidx_processed = txq->ift_pidx = txq->ift_cidx = txq->ift_npending = 0;
 	txq->ift_size = sctx->isc_ntxd;
 
 	for (i = 0, di = txq->ift_ifdi; i < ctx->ifc_nhwtxqs; i++, di++)
 		bzero((void *)di->idi_vaddr, di->idi_size);
 
 	IFDI_TXQ_SETUP(ctx, txq->ift_id);
 	for (i = 0, di = txq->ift_ifdi; i < ctx->ifc_nhwtxqs; i++, di++)
 		bus_dmamap_sync(di->idi_tag, di->idi_map,
 						BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	return (0);
 }
 
 /*********************************************************************
  *
  *  Allocate memory for rx_buffer structures. Since we use one
  *  rx_buffer per received packet, the maximum number of rx_buffer's
  *  that we'll need is equal to the number of receive descriptors
  *  that we've allocated.
  *
  **********************************************************************/
 static int
 iflib_rxsd_alloc(iflib_rxq_t rxq)
 {
 	if_ctx_t ctx = rxq->ifr_ctx;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	device_t dev = ctx->ifc_dev;
 	iflib_fl_t fl;
 	iflib_rxsd_t	rxsd;
 	int			err;
 
 	MPASS(sctx->isc_nrxd > 0);
 
 	fl = rxq->ifr_fl;
 	for (int i = 0; i <  rxq->ifr_nfl; i++, fl++) {
 		fl->ifl_sds = malloc(sizeof(struct iflib_sw_rx_desc) *
 							 sctx->isc_nrxd, M_IFLIB, M_WAITOK | M_ZERO);
 		if (fl->ifl_sds == NULL) {
 			device_printf(dev, "Unable to allocate rx sw desc memory\n");
 			return (ENOMEM);
 		}
 		fl->ifl_size = sctx->isc_nrxd; /* this isn't necessarily the same */
 		err = bus_dma_tag_create(bus_get_dma_tag(dev), /* parent */
 					 1, 0,			/* alignment, bounds */
 					 BUS_SPACE_MAXADDR,	/* lowaddr */
 					 BUS_SPACE_MAXADDR,	/* highaddr */
 					 NULL, NULL,		/* filter, filterarg */
 					 sctx->isc_rx_maxsize,	/* maxsize */
 					 sctx->isc_rx_nsegments,	/* nsegments */
 					 sctx->isc_rx_maxsegsize,	/* maxsegsize */
 					 0,			/* flags */
 					 NULL,			/* lockfunc */
 					 NULL,			/* lockarg */
 					 &fl->ifl_desc_tag);
 		if (err) {
 			device_printf(dev, "%s: bus_dma_tag_create failed %d\n",
 				__func__, err);
 			goto fail;
 		}
 
 		rxsd = fl->ifl_sds;
 		for (int i = 0; i < sctx->isc_nrxd; i++, rxsd++) {
 			err = bus_dmamap_create(fl->ifl_desc_tag, 0, &rxsd->ifsd_map);
 			if (err) {
 				device_printf(dev, "%s: bus_dmamap_create failed: %d\n",
 					__func__, err);
 				goto fail;
 			}
 		}
 	}
 	return (0);
 
 fail:
 	iflib_rx_structures_free(ctx);
 	return (err);
 }
 
 
 /*
  * Internal service routines
  */
 
 struct rxq_refill_cb_arg {
 	int               error;
 	bus_dma_segment_t seg;
 	int               nseg;
 };
 
 static void
 _rxq_refill_cb(void *arg, bus_dma_segment_t *segs, int nseg, int error)
 {
 	struct rxq_refill_cb_arg *cb_arg = arg;
 
 	cb_arg->error = error;
 	cb_arg->seg = segs[0];
 	cb_arg->nseg = nseg;
 }
 
 
 #ifdef ACPI_DMAR
 #define IS_DMAR(ctx) (ctx->ifc_flags & IFC_DMAR)
 #else
 #define IS_DMAR(ctx) (0)
 #endif
 
 /**
  *	rxq_refill - refill an rxq  free-buffer list
  *	@ctx: the iflib context
  *	@rxq: the free-list to refill
  *	@n: the number of new buffers to allocate
  *
  *	(Re)populate an rxq free-buffer list with up to @n new packet buffers.
  *	The caller must assure that @n does not exceed the queue's capacity.
  */
 static void
 _iflib_fl_refill(if_ctx_t ctx, iflib_fl_t fl, int count)
 {
 	struct mbuf *m;
 	int pidx = fl->ifl_pidx;
 	iflib_rxsd_t rxsd = &fl->ifl_sds[pidx];
 	caddr_t cl;
 	int n, i = 0;
 	uint64_t bus_addr;
 	int err;
 
 	n  = count;
 	MPASS(n > 0);
 	MPASS(fl->ifl_credits + n <= fl->ifl_size);
 
 	if (pidx < fl->ifl_cidx)
 		MPASS(pidx + n <= fl->ifl_cidx);
 	if (pidx == fl->ifl_cidx && (fl->ifl_credits < fl->ifl_size))
 		MPASS(fl->ifl_gen == 0);
 	if (pidx > fl->ifl_cidx)
 		MPASS(n <= fl->ifl_size - pidx + fl->ifl_cidx);
 
 	DBG_COUNTER_INC(fl_refills);
 	if (n > 8)
 		DBG_COUNTER_INC(fl_refills_large);
 
 	while (n--) {
 		/*
 		 * We allocate an uninitialized mbuf + cluster, mbuf is
 		 * initialized after rx.
 		 *
 		 * If the cluster is still set then we know a minimum sized packet was received
 		 */
 		if ((cl = rxsd->ifsd_cl) == NULL) {
 			if ((cl = rxsd->ifsd_cl = m_cljget(NULL, M_NOWAIT, fl->ifl_buf_size)) == NULL)
 				break;
 #if MEMORY_LOGGING
 			fl->ifl_cl_enqueued++;
 #endif
 		}
 		if ((m = m_gethdr(M_NOWAIT, MT_NOINIT)) == NULL) {
 			break;
 		}
 #if MEMORY_LOGGING
 		fl->ifl_m_enqueued++;
 #endif
 
 		DBG_COUNTER_INC(rx_allocs);
 #ifdef notyet
 		if ((rxsd->ifsd_flags & RX_SW_DESC_MAP_CREATED) == 0) {
 			int err;
 
 			if ((err = bus_dmamap_create(fl->ifl_ifdi->idi_tag, 0, &rxsd->ifsd_map))) {
 				log(LOG_WARNING, "bus_dmamap_create failed %d\n", err);
 				uma_zfree(fl->ifl_zone, cl);
 				n = 0;
 				goto done;
 			}
 			rxsd->ifsd_flags |= RX_SW_DESC_MAP_CREATED;
 		}
 #endif
 #if defined(__i386__) || defined(__amd64__)
 		if (!IS_DMAR(ctx)) {
 			bus_addr = pmap_kextract((vm_offset_t)cl);
 		} else
 #endif
 		{
 			struct rxq_refill_cb_arg cb_arg;
 			iflib_rxq_t q;
 
 			cb_arg.error = 0;
 			q = fl->ifl_rxq;
 			err = bus_dmamap_load(fl->ifl_desc_tag, rxsd->ifsd_map,
 		         cl, fl->ifl_buf_size, _rxq_refill_cb, &cb_arg, 0);
 
 			if (err != 0 || cb_arg.error) {
 				/*
 				 * !zone_pack ?
 				 */
 				if (fl->ifl_zone == zone_pack)
 					uma_zfree(fl->ifl_zone, cl);
 				m_free(m);
 				n = 0;
 				goto done;
 			}
 			bus_addr = cb_arg.seg.ds_addr;
 		}
 		rxsd->ifsd_flags |= RX_SW_DESC_INUSE;
 
 		MPASS(rxsd->ifsd_m == NULL);
 		rxsd->ifsd_cl = cl;
 		rxsd->ifsd_m = m;
 		fl->ifl_bus_addrs[i] = bus_addr;
 		fl->ifl_vm_addrs[i] = cl;
 		rxsd++;
 		fl->ifl_credits++;
 		i++;
 		MPASS(fl->ifl_credits <= fl->ifl_size);
 		if (++fl->ifl_pidx == fl->ifl_size) {
 			fl->ifl_pidx = 0;
 			fl->ifl_gen = 1;
 			rxsd = fl->ifl_sds;
 		}
 		if (n == 0 || i == IFLIB_MAX_RX_REFRESH) {
 			ctx->isc_rxd_refill(ctx->ifc_softc, fl->ifl_rxq->ifr_id, fl->ifl_id, pidx,
 								 fl->ifl_bus_addrs, fl->ifl_vm_addrs, i);
 			i = 0;
 			pidx = fl->ifl_pidx;
 		}
 	}
 done:
 	DBG_COUNTER_INC(rxd_flush);
 	if (fl->ifl_pidx == 0)
 		pidx = fl->ifl_size - 1;
 	else
 		pidx = fl->ifl_pidx - 1;
 	ctx->isc_rxd_flush(ctx->ifc_softc, fl->ifl_rxq->ifr_id, fl->ifl_id, pidx);
 }
 
 static __inline void
 __iflib_fl_refill_lt(if_ctx_t ctx, iflib_fl_t fl, int max)
 {
 	/* we avoid allowing pidx to catch up with cidx as it confuses ixl */
 	int32_t reclaimable = fl->ifl_size - fl->ifl_credits - 1;
 #ifdef INVARIANTS
 	int32_t delta = fl->ifl_size - get_inuse(fl->ifl_size, fl->ifl_cidx, fl->ifl_pidx, fl->ifl_gen) - 1;
 #endif
 
 	MPASS(fl->ifl_credits <= fl->ifl_size);
 	MPASS(reclaimable == delta);
 
 	if (reclaimable > 0)
 		_iflib_fl_refill(ctx, fl, min(max, reclaimable));
 }
 
 static void
 iflib_fl_bufs_free(iflib_fl_t fl)
 {
 	iflib_dma_info_t idi = fl->ifl_ifdi;
 	uint32_t i;
 
 	for (i = 0; i < fl->ifl_size; i++) {
 		iflib_rxsd_t d = &fl->ifl_sds[i];
 
 		if (d->ifsd_flags & RX_SW_DESC_INUSE) {
 			bus_dmamap_unload(fl->ifl_desc_tag, d->ifsd_map);
 			bus_dmamap_destroy(fl->ifl_desc_tag, d->ifsd_map);
 			if (d->ifsd_m != NULL) {
 				m_init(d->ifsd_m, M_NOWAIT, MT_DATA, 0);
 				uma_zfree(zone_mbuf, d->ifsd_m);
 			}
 			if (d->ifsd_cl != NULL)
 				uma_zfree(fl->ifl_zone, d->ifsd_cl);
 			d->ifsd_flags = 0;
 		} else {
 			MPASS(d->ifsd_cl == NULL);
 			MPASS(d->ifsd_m == NULL);
 		}
 #if MEMORY_LOGGING
 		fl->ifl_m_dequeued++;
 		fl->ifl_cl_dequeued++;
 #endif
 		d->ifsd_cl = NULL;
 		d->ifsd_m = NULL;
 	}
 	/*
 	 * Reset free list values
 	 */
 	fl->ifl_credits = fl->ifl_cidx = fl->ifl_pidx = fl->ifl_gen = 0;;
 	bzero(idi->idi_vaddr, idi->idi_size);
 }
 
 /*********************************************************************
  *
  *  Initialize a receive ring and its buffers.
  *
  **********************************************************************/
 static int
 iflib_fl_setup(iflib_fl_t fl)
 {
 	iflib_rxq_t rxq = fl->ifl_rxq;
 	if_ctx_t ctx = rxq->ifr_ctx;
 	if_softc_ctx_t sctx = &ctx->ifc_softc_ctx;
 
 	/*
 	** Free current RX buffer structs and their mbufs
 	*/
 	iflib_fl_bufs_free(fl);
 	/* Now replenish the mbufs */
 	MPASS(fl->ifl_credits == 0);
 	/*
 	 * XXX don't set the max_frame_size to larger
 	 * than the hardware can handle
 	 */
 	if (sctx->isc_max_frame_size <= 2048)
 		fl->ifl_buf_size = MCLBYTES;
 	else if (sctx->isc_max_frame_size <= 4096)
 		fl->ifl_buf_size = MJUMPAGESIZE;
 	else if (sctx->isc_max_frame_size <= 9216)
 		fl->ifl_buf_size = MJUM9BYTES;
 	else
 		fl->ifl_buf_size = MJUM16BYTES;
 	if (fl->ifl_buf_size > ctx->ifc_max_fl_buf_size)
 		ctx->ifc_max_fl_buf_size = fl->ifl_buf_size;
 	fl->ifl_cltype = m_gettype(fl->ifl_buf_size);
 	fl->ifl_zone = m_getzone(fl->ifl_buf_size);
 
 
 	/* avoid pre-allocating zillions of clusters to an idle card
 	 * potentially speeding up attach
 	 */
 	_iflib_fl_refill(ctx, fl, min(128, fl->ifl_size));
 	MPASS(min(128, fl->ifl_size) == fl->ifl_credits);
 	if (min(128, fl->ifl_size) != fl->ifl_credits)
 		return (ENOBUFS);
 	/*
 	 * handle failure
 	 */
 	MPASS(rxq != NULL);
 	MPASS(fl->ifl_ifdi != NULL);
 	bus_dmamap_sync(fl->ifl_ifdi->idi_tag, fl->ifl_ifdi->idi_map,
 	    BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 	return (0);
 }
 
 /*********************************************************************
  *
  *  Free receive ring data structures
  *
  **********************************************************************/
 static void
 iflib_rx_sds_free(iflib_rxq_t rxq)
 {
 	iflib_fl_t fl;
 	int i;
 
 	if (rxq->ifr_fl != NULL) {
 		for (i = 0; i < rxq->ifr_nfl; i++) {
 			fl = &rxq->ifr_fl[i];
 			if (fl->ifl_desc_tag != NULL) {
 				bus_dma_tag_destroy(fl->ifl_desc_tag);
 				fl->ifl_desc_tag = NULL;
 			}
 		}
 		if (rxq->ifr_fl->ifl_sds != NULL)
 			free(rxq->ifr_fl->ifl_sds, M_IFLIB);
 
 		free(rxq->ifr_fl, M_IFLIB);
 		rxq->ifr_fl = NULL;
 		rxq->ifr_cq_gen = rxq->ifr_cq_cidx = rxq->ifr_cq_pidx = 0;
 	}
 }
 
 /*
  * MI independent logic
  *
  */
 static void
 iflib_timer(void *arg)
 {
 	iflib_txq_t txq = arg;
 	if_ctx_t ctx = txq->ift_ctx;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 
 	if (!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING))
 		return;
 	/*
 	** Check on the state of the TX queue(s), this
 	** can be done without the lock because its RO
 	** and the HUNG state will be static if set.
 	*/
 	IFDI_TIMER(ctx, txq->ift_id);
 	if ((txq->ift_qstatus == IFLIB_QUEUE_HUNG) &&
 		(ctx->ifc_pause_frames == 0))
 		goto hung;
 
 	if (TXQ_AVAIL(txq) <= 2*scctx->isc_tx_nsegments ||
 	    ifmp_ring_is_stalled(txq->ift_br[0]))
 		GROUPTASK_ENQUEUE(&txq->ift_task);
 
 	ctx->ifc_pause_frames = 0;
 	if (if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING) 
 		callout_reset_on(&txq->ift_timer, hz/2, iflib_timer, txq, txq->ift_timer.c_cpu);
 	return;
 hung:
 	CTX_LOCK(ctx);
 	if_setdrvflagbits(ctx->ifc_ifp, 0, IFF_DRV_RUNNING);
 	device_printf(ctx->ifc_dev,  "TX(%d) desc avail = %d, pidx = %d\n",
 				  txq->ift_id, TXQ_AVAIL(txq), txq->ift_pidx);
 
 	IFDI_WATCHDOG_RESET(ctx);
 	ctx->ifc_watchdog_events++;
 	ctx->ifc_pause_frames = 0;
 
 	iflib_init_locked(ctx);
 	CTX_UNLOCK(ctx);
 }
 
 static void
 iflib_init_locked(if_ctx_t ctx)
 {
 	if_softc_ctx_t sctx = &ctx->ifc_softc_ctx;
 	if_t ifp = ctx->ifc_ifp;
 	iflib_fl_t fl;
 	iflib_txq_t txq;
 	iflib_rxq_t rxq;
 	int i, j;
 
 
 	if_setdrvflagbits(ifp, IFF_DRV_OACTIVE, IFF_DRV_RUNNING);
 	IFDI_INTR_DISABLE(ctx);
 
 	/* Set hardware offload abilities */
 	if_clearhwassist(ifp);
 	if (if_getcapenable(ifp) & IFCAP_TXCSUM)
 		if_sethwassistbits(ifp, CSUM_IP | CSUM_TCP | CSUM_UDP, 0);
 	if (if_getcapenable(ifp) & IFCAP_TXCSUM_IPV6)
 		if_sethwassistbits(ifp,  (CSUM_TCP_IPV6 | CSUM_UDP_IPV6), 0);
 	if (if_getcapenable(ifp) & IFCAP_TSO4)
 		if_sethwassistbits(ifp, CSUM_IP_TSO, 0);
 	if (if_getcapenable(ifp) & IFCAP_TSO6)
 		if_sethwassistbits(ifp, CSUM_IP6_TSO, 0);
 
 	for (i = 0, txq = ctx->ifc_txqs; i < sctx->isc_ntxqsets; i++, txq++) {
 		CALLOUT_LOCK(txq);
 		callout_stop(&txq->ift_timer);
 		callout_stop(&txq->ift_db_check);
 		CALLOUT_UNLOCK(txq);
 		iflib_netmap_txq_init(ctx, txq);
 	}
 	for (i = 0, rxq = ctx->ifc_rxqs; i < sctx->isc_nrxqsets; i++, rxq++) {
 		iflib_netmap_rxq_init(ctx, rxq);
 	}
 	IFDI_INIT(ctx);
 	for (i = 0, rxq = ctx->ifc_rxqs; i < sctx->isc_nrxqsets; i++, rxq++) {
 		for (j = 0, fl = rxq->ifr_fl; j < rxq->ifr_nfl; j++, fl++) {
 			if (iflib_fl_setup(fl)) {
 				device_printf(ctx->ifc_dev, "freelist setup failed - check cluster settings\n");
 				goto done;
 			}
 		}
 	}
 	done:
 	if_setdrvflagbits(ctx->ifc_ifp, IFF_DRV_RUNNING, IFF_DRV_OACTIVE);
 	IFDI_INTR_ENABLE(ctx);
 	txq = ctx->ifc_txqs;
 	for (i = 0; i < sctx->isc_ntxqsets; i++, txq++)
 		callout_reset_on(&txq->ift_timer, hz/2, iflib_timer, txq,
 			txq->ift_timer.c_cpu);
 }
 
 static int
 iflib_media_change(if_t ifp)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 	int err;
 
 	CTX_LOCK(ctx);
 	if ((err = IFDI_MEDIA_CHANGE(ctx)) == 0)
 		iflib_init_locked(ctx);
 	CTX_UNLOCK(ctx);
 	return (err);
 }
 
 static void
 iflib_media_status(if_t ifp, struct ifmediareq *ifmr)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 
 	CTX_LOCK(ctx);
 	IFDI_UPDATE_ADMIN_STATUS(ctx);
 	IFDI_MEDIA_STATUS(ctx, ifmr);
 	CTX_UNLOCK(ctx);
 }
 
 static void
 iflib_stop(if_ctx_t ctx)
 {
 	iflib_txq_t txq = ctx->ifc_txqs;
 	iflib_rxq_t rxq = ctx->ifc_rxqs;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	iflib_dma_info_t di;
 	iflib_fl_t fl;
 	int i, j;
 
 	/* Tell the stack that the interface is no longer active */
 	if_setdrvflagbits(ctx->ifc_ifp, IFF_DRV_OACTIVE, IFF_DRV_RUNNING);
 
 	IFDI_INTR_DISABLE(ctx);
 	msleep(ctx, &ctx->ifc_mtx, PUSER, "iflib_init", hz);
 
 	/* Wait for current tx queue users to exit to disarm watchdog timer. */
 	for (i = 0; i < scctx->isc_ntxqsets; i++, txq++) {
 		/* make sure all transmitters have completed before proceeding XXX */
 
 		/* clean any enqueued buffers */
 		iflib_txq_check_drain(txq, 0);
 		/* Free any existing tx buffers. */
 		for (j = 0; j < sctx->isc_ntxd; j++) {
 			iflib_txsd_free(ctx, txq, j);
 		}
 		txq->ift_processed = txq->ift_cleaned = txq->ift_cidx_processed = 0;
 		txq->ift_in_use = txq->ift_cidx = txq->ift_pidx = txq->ift_no_desc_avail = 0;
 		txq->ift_closed = txq->ift_mbuf_defrag = txq->ift_mbuf_defrag_failed = 0;
 		txq->ift_no_tx_dma_setup = txq->ift_txd_encap_efbig = txq->ift_map_failed = 0;
 		txq->ift_pullups = 0;
 		ifmp_ring_reset_stats(txq->ift_br[0]);
 		for (j = 0, di = txq->ift_ifdi; j < ctx->ifc_nhwtxqs; j++, di++)
 			bzero((void *)di->idi_vaddr, di->idi_size);
 	}
 	for (i = 0; i < scctx->isc_nrxqsets; i++, rxq++) {
 		/* make sure all transmitters have completed before proceeding XXX */
 
 		for (j = 0, di = txq->ift_ifdi; j < ctx->ifc_nhwrxqs; j++, di++)
 			bzero((void *)di->idi_vaddr, di->idi_size);
 		/* also resets the free lists pidx/cidx */
 		for (j = 0, fl = rxq->ifr_fl; j < rxq->ifr_nfl; j++, fl++)
 			iflib_fl_bufs_free(fl);
 	}
 	IFDI_STOP(ctx);
 }
 
 static iflib_rxsd_t
 rxd_frag_to_sd(iflib_rxq_t rxq, if_rxd_frag_t irf, int *cltype, int unload)
 {
 	int flid, cidx;
 	iflib_rxsd_t sd;
 	iflib_fl_t fl;
 	iflib_dma_info_t di;
 
 	flid = irf->irf_flid;
 	cidx = irf->irf_idx;
 	fl = &rxq->ifr_fl[flid];
 	fl->ifl_credits--;
 #if MEMORY_LOGGING
 	fl->ifl_m_dequeued++;
 	if (cltype)
 		fl->ifl_cl_dequeued++;
 #endif
 	sd = &fl->ifl_sds[cidx];
 	di = fl->ifl_ifdi;
 	bus_dmamap_sync(di->idi_tag, di->idi_map,
 			BUS_DMASYNC_POSTREAD | BUS_DMASYNC_POSTWRITE);
 
 	/* not valid assert if bxe really does SGE from non-contiguous elements */
 	MPASS(fl->ifl_cidx == cidx);
 	if (unload)
 		bus_dmamap_unload(fl->ifl_desc_tag, sd->ifsd_map);
 
 	if (__predict_false(++fl->ifl_cidx == fl->ifl_size)) {
 		fl->ifl_cidx = 0;
 		fl->ifl_gen = 0;
 	}
 	/* YES ick */
 	if (cltype)
 		*cltype = fl->ifl_cltype;
 	return (sd);
 }
 
 static struct mbuf *
 assemble_segments(iflib_rxq_t rxq, if_rxd_info_t ri)
 {
 	int i, padlen , flags, cltype;
 	struct mbuf *m, *mh, *mt;
 	iflib_rxsd_t sd;
 	caddr_t cl;
 
 	i = 0;
 	do {
 		sd = rxd_frag_to_sd(rxq, &ri->iri_frags[i], &cltype, TRUE);
 
 		MPASS(sd->ifsd_cl != NULL);
 		MPASS(sd->ifsd_m != NULL);
 		m = sd->ifsd_m;
 		if (i == 0) {
 			flags = M_PKTHDR|M_EXT;
 			mh = mt = m;
 			padlen = ri->iri_pad;
 		} else {
 			flags = M_EXT;
 			mt->m_next = m;
 			mt = m;
 			/* assuming padding is only on the first fragment */
 			padlen = 0;
 		}
 		sd->ifsd_m = NULL;
 		cl = sd->ifsd_cl;
 		sd->ifsd_cl = NULL;
 
 		/* Can these two be made one ? */
 		m_init(m, M_NOWAIT, MT_DATA, flags);
 		m_cljset(m, cl, cltype);
 		/*
 		 * These must follow m_init and m_cljset
 		 */
 		m->m_data += padlen;
 		ri->iri_len -= padlen;
 		m->m_len = ri->iri_len;
 	} while (++i < ri->iri_nfrags);
 
 	return (mh);
 }
 
 
 
 /*
  * Process one software descriptor
  */
 static struct mbuf *
 iflib_rxd_pkt_get(iflib_rxq_t rxq, if_rxd_info_t ri)
 {
 	struct mbuf *m;
 	iflib_rxsd_t sd;
 
 	/* should I merge this back in now that the two paths are basically duplicated? */
 	if (ri->iri_len <= IFLIB_RX_COPY_THRESH) {
 		sd = rxd_frag_to_sd(rxq, &ri->iri_frags[0], NULL, FALSE);
 		m = sd->ifsd_m;
 		sd->ifsd_m = NULL;
 		m_init(m, M_NOWAIT, MT_DATA, M_PKTHDR);
 		memcpy(m->m_data, sd->ifsd_cl, ri->iri_len);
 		m->m_len = ri->iri_len;
        } else {
 		m = assemble_segments(rxq, ri);
 	}
 	m->m_pkthdr.len = ri->iri_len;
 	m->m_pkthdr.rcvif = ri->iri_ifp;
 	m->m_flags |= ri->iri_flags;
 	m->m_pkthdr.ether_vtag = ri->iri_vtag;
 	m->m_pkthdr.flowid = ri->iri_flowid;
 	M_HASHTYPE_SET(m, ri->iri_rsstype);
 	m->m_pkthdr.csum_flags = ri->iri_csum_flags;
 	m->m_pkthdr.csum_data = ri->iri_csum_data;
 	return (m);
 }
 
 static bool
 iflib_rxeof(iflib_rxq_t rxq, int budget)
 {
 	if_ctx_t ctx = rxq->ifr_ctx;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	int avail, i;
 	uint16_t *cidxp;
 	struct if_rxd_info ri;
 	int err, budget_left, rx_bytes, rx_pkts;
 	iflib_fl_t fl;
 	struct ifnet *ifp;
 	struct lro_entry *queued;
 	int lro_enabled;
 	/*
 	 * XXX early demux data packets so that if_input processing only handles
 	 * acks in interrupt context
 	 */
 	struct mbuf *m, *mh, *mt;
 
 	if (netmap_rx_irq(ctx->ifc_ifp, rxq->ifr_id, &budget)) {
 		return (FALSE);
 	}
 
 	mh = mt = NULL;
 	MPASS(budget > 0);
 	rx_pkts	= rx_bytes = 0;
 	if (sctx->isc_flags & IFLIB_HAS_CQ)
 		cidxp = &rxq->ifr_cq_cidx;
 	else
 		cidxp = &rxq->ifr_fl[0].ifl_cidx;
 	if ((avail = iflib_rxd_avail(ctx, rxq, *cidxp)) == 0) {
 		for (i = 0, fl = &rxq->ifr_fl[0]; i < sctx->isc_nfl; i++, fl++)
 			__iflib_fl_refill_lt(ctx, fl, budget + 8);
 		DBG_COUNTER_INC(rx_unavail);
 		return (false);
 	}
 
 	for (budget_left = budget; (budget_left > 0) && (avail > 0); budget_left--, avail--) {
 		if (__predict_false(!CTX_ACTIVE(ctx))) {
 			DBG_COUNTER_INC(rx_ctx_inactive);
 			break;
 		}
 		/*
 		 * Reset client set fields to their default values
 		 */
 		bzero(&ri, sizeof(ri));
 		ri.iri_qsidx = rxq->ifr_id;
 		ri.iri_cidx = *cidxp;
 		ri.iri_ifp = ctx->ifc_ifp;
 		ri.iri_frags = rxq->ifr_frags;
 		err = ctx->isc_rxd_pkt_get(ctx->ifc_softc, &ri);
 
 		/* in lieu of handling correctly - make sure it isn't being unhandled */
 		MPASS(err == 0);
 		if (sctx->isc_flags & IFLIB_HAS_CQ) {
 			/* we know we consumed _one_ CQ entry */
 			if (++rxq->ifr_cq_cidx == sctx->isc_nrxd) {
 				rxq->ifr_cq_cidx = 0;
 				rxq->ifr_cq_gen = 0;
 			}
 			/* was this only a completion queue message? */
 			if (__predict_false(ri.iri_nfrags == 0))
 				continue;
 		}
 		MPASS(ri.iri_nfrags != 0);
 		MPASS(ri.iri_len != 0);
 
 		/* will advance the cidx on the corresponding free lists */
 		m = iflib_rxd_pkt_get(rxq, &ri);
 		if (avail == 0 && budget_left)
 			avail = iflib_rxd_avail(ctx, rxq, *cidxp);
 
 		if (__predict_false(m == NULL)) {
 			DBG_COUNTER_INC(rx_mbuf_null);
 			continue;
 		}
 		/* imm_pkt: -- cxgb */
 		if (mh == NULL)
 			mh = mt = m;
 		else {
 			mt->m_nextpkt = m;
 			mt = m;
 		}
 	}
 	/* make sure that we can refill faster than drain */
 	for (i = 0, fl = &rxq->ifr_fl[0]; i < sctx->isc_nfl; i++, fl++)
 		__iflib_fl_refill_lt(ctx, fl, budget + 8);
 
 	ifp = ctx->ifc_ifp;
 	lro_enabled = (if_getcapenable(ifp) & IFCAP_LRO);
 
 	while (mh != NULL) {
 		m = mh;
 		mh = mh->m_nextpkt;
 		m->m_nextpkt = NULL;
 		rx_bytes += m->m_pkthdr.len;
 		rx_pkts++;
 #if defined(INET6) || defined(INET)
 		if (lro_enabled && tcp_lro_rx(&rxq->ifr_lc, m, 0) == 0)
 			continue;
 #endif
 		DBG_COUNTER_INC(rx_if_input);
 		ifp->if_input(ifp, m);
 	}
 	if_inc_counter(ifp, IFCOUNTER_IBYTES, rx_bytes);
 	if_inc_counter(ifp, IFCOUNTER_IPACKETS, rx_pkts);
 
 	/*
 	 * Flush any outstanding LRO work
 	 */
 	while ((queued = LIST_FIRST(&rxq->ifr_lc.lro_active)) != NULL) {
 		LIST_REMOVE(queued, next);
 #if defined(INET6) || defined(INET)
 		tcp_lro_flush(&rxq->ifr_lc, queued);
 #endif
 	}
 	return (iflib_rxd_avail(ctx, rxq, *cidxp));
 }
 
 #define M_CSUM_FLAGS(m) ((m)->m_pkthdr.csum_flags)
 #define M_HAS_VLANTAG(m) (m->m_flags & M_VLANTAG)
 #define TXQ_MAX_DB_DEFERRED(ctx) (ctx->ifc_sctx->isc_ntxd >> 5)
 #define TXQ_MAX_DB_CONSUMED(ctx) (ctx->ifc_sctx->isc_ntxd >> 4)
 
 static __inline void
 iflib_txd_db_check(if_ctx_t ctx, iflib_txq_t txq, int ring)
 {
 	uint32_t dbval;
 
 	if (ring || txq->ift_db_pending >= TXQ_MAX_DB_DEFERRED(ctx)) {
 
 		/* the lock will only ever be contended in the !min_latency case */
 		if (!TXDB_TRYLOCK(txq))
 			return;
 		dbval = txq->ift_npending ? txq->ift_npending : txq->ift_pidx;
 		ctx->isc_txd_flush(ctx->ifc_softc, txq->ift_id, dbval);
 		txq->ift_db_pending = txq->ift_npending = 0;
 		TXDB_UNLOCK(txq);
 	}
 }
 
 static void
 iflib_txd_deferred_db_check(void * arg)
 {
 	iflib_txq_t txq = arg;
 
 	/* simple non-zero boolean so use bitwise OR */
 	if ((txq->ift_db_pending | txq->ift_npending) &&
 	    txq->ift_db_pending >= txq->ift_db_pending_queued)
 		iflib_txd_db_check(txq->ift_ctx, txq, TRUE);
 	txq->ift_db_pending_queued = 0;
 	if (ifmp_ring_is_stalled(txq->ift_br[0]))
 		iflib_txq_check_drain(txq, 4);
 }
 
 #ifdef PKT_DEBUG
 static void
 print_pkt(if_pkt_info_t pi)
 {
 	printf("pi len:  %d qsidx: %d nsegs: %d ndescs: %d flags: %x pidx: %d\n",
 	       pi->ipi_len, pi->ipi_qsidx, pi->ipi_nsegs, pi->ipi_ndescs, pi->ipi_flags, pi->ipi_pidx);
 	printf("pi new_pidx: %d csum_flags: %lx tso_segsz: %d mflags: %x vtag: %d\n",
 	       pi->ipi_new_pidx, pi->ipi_csum_flags, pi->ipi_tso_segsz, pi->ipi_mflags, pi->ipi_vtag);
 	printf("pi etype: %d ehdrlen: %d ip_hlen: %d ipproto: %d\n",
 	       pi->ipi_etype, pi->ipi_ehdrlen, pi->ipi_ip_hlen, pi->ipi_ipproto);
 }
 #endif
 
 #define IS_TSO4(pi) ((pi)->ipi_csum_flags & CSUM_IP_TSO)
 #define IS_TSO6(pi) ((pi)->ipi_csum_flags & CSUM_IP6_TSO)
 
 static int
 iflib_parse_header(iflib_txq_t txq, if_pkt_info_t pi, struct mbuf **mp)
 {
 	struct ether_vlan_header *eh;
 	struct mbuf *m;
 
 	m = *mp;
 	/*
 	 * Determine where frame payload starts.
 	 * Jump over vlan headers if already present,
 	 * helpful for QinQ too.
 	 */
 	if (__predict_false(m->m_len < sizeof(*eh))) {
 		txq->ift_pullups++;
 		if (__predict_false((m = m_pullup(m, sizeof(*eh))) == NULL))
 			return (ENOMEM);
 	}
 	eh = mtod(m, struct ether_vlan_header *);
 	if (eh->evl_encap_proto == htons(ETHERTYPE_VLAN)) {
 		pi->ipi_etype = ntohs(eh->evl_proto);
 		pi->ipi_ehdrlen = ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN;
 	} else {
 		pi->ipi_etype = ntohs(eh->evl_encap_proto);
 		pi->ipi_ehdrlen = ETHER_HDR_LEN;
 	}
 
 	switch (pi->ipi_etype) {
 #ifdef INET
 	case ETHERTYPE_IP:
 	{
 		struct ip *ip = NULL;
 		struct tcphdr *th = NULL;
 		struct mbuf *n;
 		int minthlen;
 
 		minthlen = min(m->m_pkthdr.len, pi->ipi_ehdrlen + sizeof(*ip) + sizeof(*th));
 		if (__predict_false(m->m_len < minthlen)) {
 			/*
 			 * if this code bloat is causing too much of a hit
 			 * move it to a separate function and mark it noinline
 			 */
 			if (m->m_len == pi->ipi_ehdrlen) {
 				n = m->m_next;
 				MPASS(n);
 				if (n->m_len >= sizeof(*ip))  {
 					ip = (struct ip *)n->m_data;
 					if (n->m_len >= (ip->ip_hl << 2) + sizeof(*th))
 						th = (struct tcphdr *)((caddr_t)ip + (ip->ip_hl << 2));
 				} else {
 					txq->ift_pullups++;
 					if (__predict_false((m = m_pullup(m, minthlen)) == NULL))
 						return (ENOMEM);
 					ip = (struct ip *)(m->m_data + pi->ipi_ehdrlen);
 				}
 			} else {
 				txq->ift_pullups++;
 				if (__predict_false((m = m_pullup(m, minthlen)) == NULL))
 					return (ENOMEM);
 				ip = (struct ip *)(m->m_data + pi->ipi_ehdrlen);
 				if (m->m_len >= (ip->ip_hl << 2) + sizeof(*th))
 					th = (struct tcphdr *)((caddr_t)ip + (ip->ip_hl << 2));
 			}
 		} else {
 			ip = (struct ip *)(m->m_data + pi->ipi_ehdrlen);
 			if (m->m_len >= (ip->ip_hl << 2) + sizeof(*th))
 				th = (struct tcphdr *)((caddr_t)ip + (ip->ip_hl << 2));
 		}
 		pi->ipi_ip_hlen = ip->ip_hl << 2;
 		pi->ipi_ipproto = ip->ip_p;
 		pi->ipi_flags |= IPI_TX_IPV4;
 
 		if (pi->ipi_csum_flags & CSUM_IP)
                        ip->ip_sum = 0;
 
 		if (pi->ipi_ipproto == IPPROTO_TCP) {
 			if (__predict_false(th == NULL)) {
 				txq->ift_pullups++;
 				if (__predict_false((m = m_pullup(m, (ip->ip_hl << 2) + sizeof(*th))) == NULL))
 					return (ENOMEM);
 				th = (struct tcphdr *)((caddr_t)ip + pi->ipi_ip_hlen);
 			}
 			pi->ipi_tcp_hflags = th->th_flags;
 			pi->ipi_tcp_hlen = th->th_off << 2;
 			pi->ipi_tcp_seq = th->th_seq;
 		}
 		if (IS_TSO4(pi)) {
 			if (__predict_false(ip->ip_p != IPPROTO_TCP))
 				return (ENXIO);
 			th->th_sum = in_pseudo(ip->ip_src.s_addr,
 					       ip->ip_dst.s_addr, htons(IPPROTO_TCP));
 			pi->ipi_tso_segsz = m->m_pkthdr.tso_segsz;
 		}
 		break;
 	}
 #endif
 #ifdef INET6
 	case ETHERTYPE_IPV6:
 	{
 		struct ip6_hdr *ip6 = (struct ip6_hdr *)(m->m_data + pi->ipi_ehdrlen);
 		struct tcphdr *th;
 		pi->ipi_ip_hlen = sizeof(struct ip6_hdr);
 
 		if (__predict_false(m->m_len < pi->ipi_ehdrlen + sizeof(struct ip6_hdr))) {
 			if (__predict_false((m = m_pullup(m, pi->ipi_ehdrlen + sizeof(struct ip6_hdr))) == NULL))
 				return (ENOMEM);
 		}
 		th = (struct tcphdr *)((caddr_t)ip6 + pi->ipi_ip_hlen);
 
 		/* XXX-BZ this will go badly in case of ext hdrs. */
 		pi->ipi_ipproto = ip6->ip6_nxt;
 		pi->ipi_flags |= IPI_TX_IPV6;
 
 		if (pi->ipi_ipproto == IPPROTO_TCP) {
 			if (__predict_false(m->m_len < pi->ipi_ehdrlen + sizeof(struct ip6_hdr) + sizeof(struct tcphdr))) {
 				if (__predict_false((m = m_pullup(m, pi->ipi_ehdrlen + sizeof(struct ip6_hdr) + sizeof(struct tcphdr))) == NULL))
 					return (ENOMEM);
 			}
 			pi->ipi_tcp_hflags = th->th_flags;
 			pi->ipi_tcp_hlen = th->th_off << 2;
 		}
 		if (IS_TSO6(pi)) {
 
 			if (__predict_false(ip6->ip6_nxt != IPPROTO_TCP))
 				return (ENXIO);
 			/*
 			 * The corresponding flag is set by the stack in the IPv4
 			 * TSO case, but not in IPv6 (at least in FreeBSD 10.2).
 			 * So, set it here because the rest of the flow requires it.
 			 */
 			pi->ipi_csum_flags |= CSUM_TCP_IPV6;
 			th->th_sum = in6_cksum_pseudo(ip6, 0, IPPROTO_TCP, 0);
 			pi->ipi_tso_segsz = m->m_pkthdr.tso_segsz;
 		}
 		break;
 	}
 #endif
 	default:
 		pi->ipi_csum_flags &= ~CSUM_OFFLOAD;
 		pi->ipi_ip_hlen = 0;
 		break;
 	}
 	*mp = m;
 	return (0);
 }
 
 
 static  __noinline  struct mbuf *
 collapse_pkthdr(struct mbuf *m0)
 {
 	struct mbuf *m, *m_next, *tmp;
 
 	m = m0;
 	m_next = m->m_next;
 	while (m_next != NULL && m_next->m_len == 0) {
 		m = m_next;
 		m->m_next = NULL;
 		m_free(m);
 		m_next = m_next->m_next;
 	}
 	m = m0;
 	m->m_next = m_next;
 	if ((m_next->m_flags & M_EXT) == 0) {
 		m = m_defrag(m, M_NOWAIT);
 	} else {
 		tmp = m_next->m_next;
 		memcpy(m_next, m, MPKTHSIZE);
 		m = m_next;
 		m->m_next = tmp;
 	}
 	return (m);
 }
 
 /*
  * If dodgy hardware rejects the scatter gather chain we've handed it
  * we'll need to rebuild the mbuf chain before we can call m_defrag
  */
 static __noinline struct mbuf *
 iflib_rebuild_mbuf(iflib_txq_t txq)
 {
 
 	int ntxd, mhlen, len, i, pidx;
 	struct mbuf *m, *mh, **ifsd_m;
 	if_shared_ctx_t		sctx;
 
 	pidx = txq->ift_pidx;
 	ifsd_m = txq->ift_sds.ifsd_m;
 	sctx = txq->ift_ctx->ifc_sctx;
 	ntxd = sctx->isc_ntxd;
 	mh = m = ifsd_m[pidx];
 	ifsd_m[pidx] = NULL;
 #if MEMORY_LOGGING
 	txq->ift_dequeued++;
 #endif
 	len = m->m_len;
 	mhlen = m->m_pkthdr.len;
 	i = 1;
 
 	while (len < mhlen && (m->m_next == NULL)) {
 		m->m_next = ifsd_m[(pidx + i) & (ntxd-1)];
 		ifsd_m[(pidx + i) & (ntxd -1)] = NULL;
 #if MEMORY_LOGGING
 		txq->ift_dequeued++;
 #endif
 		m = m->m_next;
 		len += m->m_len;
 		i++;
 	}
 	return (mh);
 }
 
 static int
 iflib_busdma_load_mbuf_sg(iflib_txq_t txq, bus_dma_tag_t tag, bus_dmamap_t map,
 			  struct mbuf **m0, bus_dma_segment_t *segs, int *nsegs,
 			  int max_segs, int flags)
 {
 	if_ctx_t ctx;
 	if_shared_ctx_t		sctx;
 	int i, next, pidx, mask, err, maxsegsz, ntxd, count;
 	struct mbuf *m, *tmp, **ifsd_m, **mp;
 
 	m = *m0;
 
 	/*
 	 * Please don't ever do this
 	 */
 	if (__predict_false(m->m_len == 0))
 		*m0 = m = collapse_pkthdr(m);
 
 	ctx = txq->ift_ctx;
 	sctx = ctx->ifc_sctx;
 	ifsd_m = txq->ift_sds.ifsd_m;
 	ntxd = sctx->isc_ntxd;
 	pidx = txq->ift_pidx;
 	if (map != NULL) {
 		uint8_t *ifsd_flags = txq->ift_sds.ifsd_flags;
 
 		err = bus_dmamap_load_mbuf_sg(tag, map,
 					      *m0, segs, nsegs, BUS_DMA_NOWAIT);
 		if (err)
 			return (err);
 		ifsd_flags[pidx] |= TX_SW_DESC_MAPPED;
 		i = 0;
 		next = pidx;
 		mask = (sctx->isc_ntxd-1);
 		m = *m0;
 		do {
 			mp = &ifsd_m[next];
 			*mp = m;
 			m = m->m_next;
 			(*mp)->m_next = NULL;
 			if (__predict_false((*mp)->m_len == 0)) {
 				m_free(*mp);
 				*mp = NULL;
 			} else
 				next = (pidx + i) & (ntxd-1);
 		} while (m != NULL);
 	} else {
 		int buflen, sgsize, max_sgsize;
 		vm_offset_t vaddr;
 		vm_paddr_t curaddr;
 
 		count = i = 0;
 		maxsegsz = sctx->isc_tx_maxsize;
 		m = *m0;
 		do {
 			if (__predict_false(m->m_len <= 0)) {
 				tmp = m;
 				m = m->m_next;
 				tmp->m_next = NULL;
 				m_free(tmp);
 				continue;
 			}
 			buflen = m->m_len;
 			vaddr = (vm_offset_t)m->m_data;
 			/*
 			 * see if we can't be smarter about physically
 			 * contiguous mappings
 			 */
 			next = (pidx + count) & (ntxd-1);
 			MPASS(ifsd_m[next] == NULL);
 #if MEMORY_LOGGING
 			txq->ift_enqueued++;
 #endif
 			ifsd_m[next] = m;
 			while (buflen > 0) {
 				max_sgsize = MIN(buflen, maxsegsz);
 				curaddr = pmap_kextract(vaddr);
 				sgsize = PAGE_SIZE - (curaddr & PAGE_MASK);
 				sgsize = MIN(sgsize, max_sgsize);
 				segs[i].ds_addr = curaddr;
 				segs[i].ds_len = sgsize;
 				vaddr += sgsize;
 				buflen -= sgsize;
 				i++;
 				if (i >= max_segs)
 					goto err;
 			}
 			count++;
 			tmp = m;
 			m = m->m_next;
 			tmp->m_next = NULL;
 		} while (m != NULL);
 		*nsegs = i;
 	}
 	return (0);
 err:
 	*m0 = iflib_rebuild_mbuf(txq);
 	return (EFBIG);
 }
 
 static int
 iflib_encap(iflib_txq_t txq, struct mbuf **m_headp)
 {
 	if_ctx_t		ctx;
 	if_shared_ctx_t		sctx;
 	if_softc_ctx_t		scctx;
 	bus_dma_segment_t	*segs;
 	struct mbuf		*m_head;
 	bus_dmamap_t		map;
 	struct if_pkt_info	pi;
 	int remap = 0;
 	int err, nsegs, ndesc, max_segs, pidx, cidx, next, ntxd;
 	bus_dma_tag_t desc_tag;
 
 	segs = txq->ift_segs;
 	ctx = txq->ift_ctx;
 	sctx = ctx->ifc_sctx;
 	scctx = &ctx->ifc_softc_ctx;
 	segs = txq->ift_segs;
 	ntxd = sctx->isc_ntxd;
 	m_head = *m_headp;
 	map = NULL;
 
 	/*
 	 * If we're doing TSO the next descriptor to clean may be quite far ahead
 	 */
 	cidx = txq->ift_cidx;
 	pidx = txq->ift_pidx;
 	next = (cidx + CACHE_PTR_INCREMENT) & (ntxd-1);
 
 	/* prefetch the next cache line of mbuf pointers and flags */
 	prefetch(&txq->ift_sds.ifsd_m[next]);
 	if (txq->ift_sds.ifsd_map != NULL) {
 		prefetch(&txq->ift_sds.ifsd_map[next]);
 		map = txq->ift_sds.ifsd_map[pidx];
 		next = (cidx + CACHE_LINE_SIZE) & (ntxd-1);
 		prefetch(&txq->ift_sds.ifsd_flags[next]);
 	}
 
 
 	if (m_head->m_pkthdr.csum_flags & CSUM_TSO) {
 		desc_tag = txq->ift_tso_desc_tag;
 		max_segs = scctx->isc_tx_tso_segments_max;
 	} else {
 		desc_tag = txq->ift_desc_tag;
 		max_segs = scctx->isc_tx_nsegments;
 	}
 	m_head = *m_headp;
 	bzero(&pi, sizeof(pi));
 	pi.ipi_len = m_head->m_pkthdr.len;
 	pi.ipi_mflags = (m_head->m_flags & (M_VLANTAG|M_BCAST|M_MCAST));
 	pi.ipi_csum_flags = m_head->m_pkthdr.csum_flags;
 	pi.ipi_vtag = (m_head->m_flags & M_VLANTAG) ? m_head->m_pkthdr.ether_vtag : 0;
 	pi.ipi_pidx = pidx;
 	pi.ipi_qsidx = txq->ift_id;
 
 	/* deliberate bitwise OR to make one condition */
 	if (__predict_true((pi.ipi_csum_flags | pi.ipi_vtag))) {
 		if (__predict_false((err = iflib_parse_header(txq, &pi, m_headp)) != 0))
 			return (err);
 		m_head = *m_headp;
 	}
 
 retry:
 	err = iflib_busdma_load_mbuf_sg(txq, desc_tag, map, m_headp, segs, &nsegs, max_segs, BUS_DMA_NOWAIT);
 defrag:
 	if (__predict_false(err)) {
 		switch (err) {
 		case EFBIG:
 			/* try collapse once and defrag once */
 			if (remap == 0)
 				m_head = m_collapse(*m_headp, M_NOWAIT, max_segs);
 			if (remap == 1)
 				m_head = m_defrag(*m_headp, M_NOWAIT);
 			remap++;
 			if (__predict_false(m_head == NULL))
 				goto defrag_failed;
 			txq->ift_mbuf_defrag++;
 			*m_headp = m_head;
 			goto retry;
 			break;
 		case ENOMEM:
 			txq->ift_no_tx_dma_setup++;
 			break;
 		default:
 			txq->ift_no_tx_dma_setup++;
 			m_freem(*m_headp);
 			DBG_COUNTER_INC(tx_frees);
 			*m_headp = NULL;
 			break;
 		}
 		txq->ift_map_failed++;
 		DBG_COUNTER_INC(encap_load_mbuf_fail);
 		return (err);
 	}
 
 	/*
 	 * XXX assumes a 1 to 1 relationship between segments and
 	 *        descriptors - this does not hold true on all drivers, e.g.
 	 *        cxgb
 	 */
 	if (__predict_false(nsegs + 2 > TXQ_AVAIL(txq))) {
 		txq->ift_no_desc_avail++;
 		if (map != NULL)
 			bus_dmamap_unload(desc_tag, map);
 		DBG_COUNTER_INC(encap_txq_avail_fail);
 		if (txq->ift_task.gt_task.ta_pending == 0)
 			GROUPTASK_ENQUEUE(&txq->ift_task);
 		return (ENOBUFS);
 	}
 	pi.ipi_segs = segs;
 	pi.ipi_nsegs = nsegs;
 
 	MPASS(pidx >= 0 && pidx < sctx->isc_ntxd);
 #ifdef PKT_DEBUG
 	print_pkt(&pi);
 #endif
 	if ((err = ctx->isc_txd_encap(ctx->ifc_softc, &pi)) == 0) {
 		bus_dmamap_sync(txq->ift_ifdi->idi_tag, txq->ift_ifdi->idi_map,
 						BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
 		DBG_COUNTER_INC(tx_encap);
 		MPASS(pi.ipi_new_pidx >= 0 && pi.ipi_new_pidx < sctx->isc_ntxd);
 
 		ndesc = pi.ipi_new_pidx - pi.ipi_pidx;
 		if (pi.ipi_new_pidx < pi.ipi_pidx) {
 			ndesc += sctx->isc_ntxd;
 			txq->ift_gen = 1;
 		}
 		MPASS(pi.ipi_new_pidx != pidx);
 		MPASS(ndesc > 0);
 		txq->ift_in_use += ndesc;
 		/*
 		 * We update the last software descriptor again here because there may
 		 * be a sentinel and/or there may be more mbufs than segments
 		 */
 		txq->ift_pidx = pi.ipi_new_pidx;
 		txq->ift_npending += pi.ipi_ndescs;
 	} else if (__predict_false(err == EFBIG && remap < 2)) {
 		*m_headp = m_head = iflib_rebuild_mbuf(txq);
 		remap = 1;
 		txq->ift_txd_encap_efbig++;
 		goto defrag;
 	} else
 		DBG_COUNTER_INC(encap_txd_encap_fail);
 	return (err);
 
 defrag_failed:
 	txq->ift_mbuf_defrag_failed++;
 	txq->ift_map_failed++;
 	m_freem(*m_headp);
 	DBG_COUNTER_INC(tx_frees);
 	*m_headp = NULL;
 	return (ENOMEM);
 }
 
 /* forward compatibility for cxgb */
 #define FIRST_QSET(ctx) 0
 
 #define NTXQSETS(ctx) ((ctx)->ifc_softc_ctx.isc_ntxqsets)
 #define NRXQSETS(ctx) ((ctx)->ifc_softc_ctx.isc_nrxqsets)
 #define QIDX(ctx, m) ((((m)->m_pkthdr.flowid & ctx->ifc_softc_ctx.isc_rss_table_mask) % NRXQSETS(ctx)) + FIRST_QSET(ctx))
 #define DESC_RECLAIMABLE(q) ((int)((q)->ift_processed - (q)->ift_cleaned - (q)->ift_ctx->ifc_softc_ctx.isc_tx_nsegments))
 #define RECLAIM_THRESH(ctx) ((ctx)->ifc_sctx->isc_tx_reclaim_thresh)
 #define MAX_TX_DESC(ctx) ((ctx)->ifc_softc_ctx.isc_tx_tso_segments_max)
 
 
 
 /* if there are more than TXQ_MIN_OCCUPANCY packets pending we consider deferring
  * doorbell writes
  *
  * ORing with 2 assures that min occupancy is never less than 2 without any conditional logic
  */
 #define TXQ_MIN_OCCUPANCY(ctx) ((ctx->ifc_sctx->isc_ntxd >> 6)| 0x2)
 
 static inline int
 iflib_txq_min_occupancy(iflib_txq_t txq)
 {
 	if_ctx_t ctx;
 
 	ctx = txq->ift_ctx;
 	return (get_inuse(txq->ift_size, txq->ift_cidx, txq->ift_pidx, txq->ift_gen) < TXQ_MIN_OCCUPANCY(ctx) + MAX_TX_DESC(ctx));
 }
 
 static void
 iflib_tx_desc_free(iflib_txq_t txq, int n)
 {
 	int hasmap;
 	uint32_t qsize, cidx, mask, gen;
 	struct mbuf *m, **ifsd_m;
 	uint8_t *ifsd_flags;
 	bus_dmamap_t *ifsd_map;
 
 	cidx = txq->ift_cidx;
 	gen = txq->ift_gen;
 	qsize = txq->ift_ctx->ifc_sctx->isc_ntxd;
 	mask = qsize-1;
 	hasmap = txq->ift_sds.ifsd_map != NULL;
 	ifsd_flags = txq->ift_sds.ifsd_flags;
 	ifsd_m = txq->ift_sds.ifsd_m;
 	ifsd_map = txq->ift_sds.ifsd_map;
 
 	while (n--) {
 		prefetch(ifsd_m[(cidx + 3) & mask]);
 		prefetch(ifsd_m[(cidx + 4) & mask]);
 
 		if (ifsd_m[cidx] != NULL) {
 			prefetch(&ifsd_m[(cidx + CACHE_PTR_INCREMENT) & mask]);
 			prefetch(&ifsd_flags[(cidx + CACHE_PTR_INCREMENT) & mask]);
 			if (hasmap && (ifsd_flags[cidx] & TX_SW_DESC_MAPPED)) {
 				/*
 				 * does it matter if it's not the TSO tag? If so we'll
 				 * have to add the type to flags
 				 */
 				bus_dmamap_unload(txq->ift_desc_tag, ifsd_map[cidx]);
 				ifsd_flags[cidx] &= ~TX_SW_DESC_MAPPED;
 			}
 			if ((m = ifsd_m[cidx]) != NULL) {
 				/* XXX we don't support any drivers that batch packets yet */
 				MPASS(m->m_nextpkt == NULL);
 
 				m_freem(m);
 				ifsd_m[cidx] = NULL;
 #if MEMORY_LOGGING
 				txq->ift_dequeued++;
 #endif
 				DBG_COUNTER_INC(tx_frees);
 			}
 		}
 		if (__predict_false(++cidx == qsize)) {
 			cidx = 0;
 			gen = 0;
 		}
 	}
 	txq->ift_cidx = cidx;
 	txq->ift_gen = gen;
 }
 
 static __inline int
 iflib_completed_tx_reclaim(iflib_txq_t txq, int thresh)
 {
 	int reclaim;
 	if_ctx_t ctx = txq->ift_ctx;
 
 	KASSERT(thresh >= 0, ("invalid threshold to reclaim"));
 	MPASS(thresh /*+ MAX_TX_DESC(txq->ift_ctx) */ < txq->ift_size);
 
 	/*
 	 * Need a rate-limiting check so that this isn't called every time
 	 */
 	iflib_tx_credits_update(ctx, txq);
 	reclaim = DESC_RECLAIMABLE(txq);
 
 	if (reclaim <= thresh /* + MAX_TX_DESC(txq->ift_ctx) */) {
 #ifdef INVARIANTS
 		if (iflib_verbose_debug) {
 			printf("%s processed=%ju cleaned=%ju tx_nsegments=%d reclaim=%d thresh=%d\n", __FUNCTION__,
 			       txq->ift_processed, txq->ift_cleaned, txq->ift_ctx->ifc_softc_ctx.isc_tx_nsegments,
 			       reclaim, thresh);
 
 		}
 #endif
 		return (0);
 	}
 	iflib_tx_desc_free(txq, reclaim);
 	txq->ift_cleaned += reclaim;
 	txq->ift_in_use -= reclaim;
 
 	if (txq->ift_active == FALSE)
 		txq->ift_active = TRUE;
 
 	return (reclaim);
 }
 
 static struct mbuf **
 _ring_peek_one(struct ifmp_ring *r, int cidx, int offset)
 {
 
 	return (__DEVOLATILE(struct mbuf **, &r->items[(cidx + offset) & (r->size-1)]));
 }
 
 static void
 iflib_txq_check_drain(iflib_txq_t txq, int budget)
 {
 
 	ifmp_ring_check_drainage(txq->ift_br[0], budget);
 }
 
 static uint32_t
 iflib_txq_can_drain(struct ifmp_ring *r)
 {
 	iflib_txq_t txq = r->cookie;
 	if_ctx_t ctx = txq->ift_ctx;
 
 	return ((TXQ_AVAIL(txq) >= MAX_TX_DESC(ctx)) ||
 		ctx->isc_txd_credits_update(ctx->ifc_softc, txq->ift_id, txq->ift_cidx_processed, false));
 }
 
 static uint32_t
 iflib_txq_drain(struct ifmp_ring *r, uint32_t cidx, uint32_t pidx)
 {
 	iflib_txq_t txq = r->cookie;
 	if_ctx_t ctx = txq->ift_ctx;
 	if_t ifp = ctx->ifc_ifp;
 	struct mbuf **mp, *m;
 	int i, count, consumed, pkt_sent, bytes_sent, mcast_sent, avail, err, in_use_prev, desc_used;
 
 	if (__predict_false(!(if_getdrvflags(ifp) & IFF_DRV_RUNNING) ||
 			    !LINK_ACTIVE(ctx))) {
 		DBG_COUNTER_INC(txq_drain_notready);
 		return (0);
 	}
 
 	avail = IDXDIFF(pidx, cidx, r->size);
 	if (__predict_false(ctx->ifc_flags & IFC_QFLUSH)) {
 		DBG_COUNTER_INC(txq_drain_flushing);
 		for (i = 0; i < avail; i++) {
 			m_freem(r->items[(cidx + i) & (r->size-1)]);
 			r->items[(cidx + i) & (r->size-1)] = NULL;
 		}
 		return (avail);
 	}
 	iflib_completed_tx_reclaim(txq, RECLAIM_THRESH(ctx));
 	if (__predict_false(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_OACTIVE)) {
 		txq->ift_qstatus = IFLIB_QUEUE_IDLE;
 		CALLOUT_LOCK(txq);
 		callout_stop(&txq->ift_timer);
 		callout_stop(&txq->ift_db_check);
 		CALLOUT_UNLOCK(txq);
 		DBG_COUNTER_INC(txq_drain_oactive);
 		return (0);
 	}
 	consumed = mcast_sent = bytes_sent = pkt_sent = 0;
 	count = MIN(avail, TX_BATCH_SIZE);
 
 	for (desc_used = i = 0; i < count && TXQ_AVAIL(txq) > MAX_TX_DESC(ctx) + 2; i++) {
 		mp = _ring_peek_one(r, cidx, i);
 		in_use_prev = txq->ift_in_use;
 		err = iflib_encap(txq, mp);
 		/*
 		 * What other errors should we bail out for?
 		 */
 		if (err == ENOBUFS) {
 			DBG_COUNTER_INC(txq_drain_encapfail);
 			break;
 		}
 		consumed++;
 		if (err)
 			continue;
 
 		pkt_sent++;
 		m = *mp;
 		DBG_COUNTER_INC(tx_sent);
 		bytes_sent += m->m_pkthdr.len;
 		if (m->m_flags & M_MCAST)
 			mcast_sent++;
 
 		txq->ift_db_pending += (txq->ift_in_use - in_use_prev);
 		desc_used += (txq->ift_in_use - in_use_prev);
 		iflib_txd_db_check(ctx, txq, FALSE);
 		ETHER_BPF_MTAP(ifp, m);
 		if (__predict_false(!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING)))
 			break;
 
 		if (desc_used > TXQ_MAX_DB_CONSUMED(ctx))
 			break;
 	}
 
 	if ((iflib_min_tx_latency || iflib_txq_min_occupancy(txq)) && txq->ift_db_pending)
 		iflib_txd_db_check(ctx, txq, TRUE);
 	else if ((txq->ift_db_pending || TXQ_AVAIL(txq) < MAX_TX_DESC(ctx)) &&
 		 (callout_pending(&txq->ift_db_check) == 0)) {
 		txq->ift_db_pending_queued = txq->ift_db_pending;
 		callout_reset_on(&txq->ift_db_check, 1, iflib_txd_deferred_db_check,
 				 txq, txq->ift_db_check.c_cpu);
 	}
 	if_inc_counter(ifp, IFCOUNTER_OBYTES, bytes_sent);
 	if_inc_counter(ifp, IFCOUNTER_OPACKETS, pkt_sent);
 	if (mcast_sent)
 		if_inc_counter(ifp, IFCOUNTER_OMCASTS, mcast_sent);
 
 	return (consumed);
 }
 
 static void
 _task_fn_tx(void *context, int pending)
 {
 	iflib_txq_t txq = context;
 	if_ctx_t ctx = txq->ift_ctx;
 
 	if (!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING))
 		return;
 	ifmp_ring_check_drainage(txq->ift_br[0], TX_BATCH_SIZE);
 }
 
 static void
 _task_fn_rx(void *context, int pending)
 {
 	iflib_rxq_t rxq = context;
 	if_ctx_t ctx = rxq->ifr_ctx;
 	bool more;
 
 	DBG_COUNTER_INC(task_fn_rxs);
 	if (__predict_false(!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING)))
 		return;
 
 	if ((more = iflib_rxeof(rxq, 16 /* XXX */)) == false) {
 		if (ctx->ifc_flags & IFC_LEGACY)
 			IFDI_INTR_ENABLE(ctx);
 		else {
 			DBG_COUNTER_INC(rx_intr_enables);
 			IFDI_QUEUE_INTR_ENABLE(ctx, rxq->ifr_id);
 		}
 	}
 	if (__predict_false(!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING)))
 		return;
 	if (more)
 		GROUPTASK_ENQUEUE(&rxq->ifr_task);
 }
 
 static void
 _task_fn_admin(void *context, int pending)
 {
 	if_ctx_t ctx = context;
 	if_softc_ctx_t sctx = &ctx->ifc_softc_ctx;
 	iflib_txq_t txq;
 	int i;
 
 	if (!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING))
 		return;
 
 	CTX_LOCK(ctx);
 	for (txq = ctx->ifc_txqs, i = 0; i < sctx->isc_ntxqsets; i++, txq++) {
 		CALLOUT_LOCK(txq);
 		callout_stop(&txq->ift_timer);
 		CALLOUT_UNLOCK(txq);
 	}
 	IFDI_UPDATE_ADMIN_STATUS(ctx);
 	for (txq = ctx->ifc_txqs, i = 0; i < sctx->isc_ntxqsets; i++, txq++)
 		callout_reset_on(&txq->ift_timer, hz/2, iflib_timer, txq, txq->ift_timer.c_cpu);
 	IFDI_LINK_INTR_ENABLE(ctx);
 	CTX_UNLOCK(ctx);
 
 	if (LINK_ACTIVE(ctx) == 0)
 		return;
 	for (txq = ctx->ifc_txqs, i = 0; i < sctx->isc_ntxqsets; i++, txq++)
 		iflib_txq_check_drain(txq, IFLIB_RESTART_BUDGET);
 }
 
 
 static void
 _task_fn_iov(void *context, int pending)
 {
 	if_ctx_t ctx = context;
 
 	if (!(if_getdrvflags(ctx->ifc_ifp) & IFF_DRV_RUNNING))
 		return;
 
 	CTX_LOCK(ctx);
 	IFDI_VFLR_HANDLE(ctx);
 	CTX_UNLOCK(ctx);
 }
 
 static int
 iflib_sysctl_int_delay(SYSCTL_HANDLER_ARGS)
 {
 	int err;
 	if_int_delay_info_t info;
 	if_ctx_t ctx;
 
 	info = (if_int_delay_info_t)arg1;
 	ctx = info->iidi_ctx;
 	info->iidi_req = req;
 	info->iidi_oidp = oidp;
 	CTX_LOCK(ctx);
 	err = IFDI_SYSCTL_INT_DELAY(ctx, info);
 	CTX_UNLOCK(ctx);
 	return (err);
 }
 
 /*********************************************************************
  *
  *  IFNET FUNCTIONS
  *
  **********************************************************************/
 
 static void
 iflib_if_init_locked(if_ctx_t ctx)
 {
 	iflib_stop(ctx);
 	iflib_init_locked(ctx);
 }
 
 
 static void
 iflib_if_init(void *arg)
 {
 	if_ctx_t ctx = arg;
 
 	CTX_LOCK(ctx);
 	iflib_if_init_locked(ctx);
 	CTX_UNLOCK(ctx);
 }
 
 static int
 iflib_if_transmit(if_t ifp, struct mbuf *m)
 {
 	if_ctx_t	ctx = if_getsoftc(ifp);
 
 	iflib_txq_t txq;
 	struct mbuf *marr[8], **mp, *next;
 	int err, i, count, qidx;
 
 	if (__predict_false((ifp->if_drv_flags & IFF_DRV_RUNNING) == 0 || !LINK_ACTIVE(ctx))) {
 		DBG_COUNTER_INC(tx_frees);
 		m_freem(m);
 		return (0);
 	}
 
 	qidx = 0;
 	if ((NTXQSETS(ctx) > 1) && M_HASHTYPE_GET(m))
 		qidx = QIDX(ctx, m);
 	/*
 	 * XXX calculate buf_ring based on flowid (divvy up bits?)
 	 */
 	txq = &ctx->ifc_txqs[qidx];
 
 #ifdef DRIVER_BACKPRESSURE
 	if (txq->ift_closed) {
 		while (m != NULL) {
 			next = m->m_nextpkt;
 			m->m_nextpkt = NULL;
 			m_freem(m);
 			m = next;
 		}
 		return (ENOBUFS);
 	}
 #endif
 	qidx = count = 0;
 	mp = marr;
 	next = m;
 	do {
 		count++;
 		next = next->m_nextpkt;
 	} while (next != NULL);
 
 	if (count > nitems(marr))
 		if ((mp = malloc(count*sizeof(struct mbuf *), M_IFLIB, M_NOWAIT)) == NULL) {
 			/* XXX check nextpkt */
 			m_freem(m);
 			/* XXX simplify for now */
 			DBG_COUNTER_INC(tx_frees);
 			return (ENOBUFS);
 		}
 	for (next = m, i = 0; next != NULL; i++) {
 		mp[i] = next;
 		next = next->m_nextpkt;
 		mp[i]->m_nextpkt = NULL;
 	}
 	DBG_COUNTER_INC(tx_seen);
 	err = ifmp_ring_enqueue(txq->ift_br[0], (void **)mp, count, TX_BATCH_SIZE);
 
 	if (iflib_txq_can_drain(txq->ift_br[0]))
 		GROUPTASK_ENQUEUE(&txq->ift_task);
 	if (err) {
 		/* support forthcoming later */
 #ifdef DRIVER_BACKPRESSURE
 		txq->ift_closed = TRUE;
 #endif
 		for (i = 0; i < count; i++)
 			m_freem(mp[i]);
 		ifmp_ring_check_drainage(txq->ift_br[0], TX_BATCH_SIZE);
 	}
 	if (count > nitems(marr))
 		free(mp, M_IFLIB);
 
 	return (err);
 }
 
 static void
 iflib_if_qflush(if_t ifp)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 	iflib_txq_t txq = ctx->ifc_txqs;
 	int i;
 
 	CTX_LOCK(ctx);
 	ctx->ifc_flags |= IFC_QFLUSH;
 	CTX_UNLOCK(ctx);
 	for (i = 0; i < NTXQSETS(ctx); i++, txq++)
 		while (!(ifmp_ring_is_idle(txq->ift_br[0]) || ifmp_ring_is_stalled(txq->ift_br[0])))
 			iflib_txq_check_drain(txq, 0);
 	CTX_LOCK(ctx);
 	ctx->ifc_flags &= ~IFC_QFLUSH;
 	CTX_UNLOCK(ctx);
 
 	if_qflush(ifp);
 }
 
 #define IFCAP_REINIT (IFCAP_HWCSUM|IFCAP_TSO4|IFCAP_TSO6|IFCAP_VLAN_HWTAGGING|IFCAP_VLAN_MTU | \
 		      IFCAP_VLAN_HWFILTER | IFCAP_VLAN_HWTSO)
 
 #define IFCAP_FLAGS (IFCAP_RXCSUM | IFCAP_RXCSUM_IPV6 | IFCAP_HWCSUM | IFCAP_LRO | \
 		     IFCAP_TSO4 | IFCAP_TSO6 | IFCAP_VLAN_HWTAGGING |	\
 		     IFCAP_VLAN_MTU | IFCAP_VLAN_HWFILTER | IFCAP_VLAN_HWTSO)
 
 static int
 iflib_if_ioctl(if_t ifp, u_long command, caddr_t data)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 	struct ifreq	*ifr = (struct ifreq *)data;
 #if defined(INET) || defined(INET6)
 	struct ifaddr	*ifa = (struct ifaddr *)data;
 #endif
 	bool		avoid_reset = FALSE;
 	int		err = 0, reinit = 0, bits;
 
 	switch (command) {
 	case SIOCSIFADDR:
 #ifdef INET
 		if (ifa->ifa_addr->sa_family == AF_INET)
 			avoid_reset = TRUE;
 #endif
 #ifdef INET6
 		if (ifa->ifa_addr->sa_family == AF_INET6)
 			avoid_reset = TRUE;
 #endif
 		/*
 		** Calling init results in link renegotiation,
 		** so we avoid doing it when possible.
 		*/
 		if (avoid_reset) {
 			if_setflagbits(ifp, IFF_UP,0);
 			if (!(if_getdrvflags(ifp)& IFF_DRV_RUNNING))
 				reinit = 1;
 #ifdef INET
 			if (!(if_getflags(ifp) & IFF_NOARP))
 				arp_ifinit(ifp, ifa);
 #endif
 		} else
 			err = ether_ioctl(ifp, command, data);
 		break;
 	case SIOCSIFMTU:
 		CTX_LOCK(ctx);
 		if (ifr->ifr_mtu == if_getmtu(ifp)) {
 			CTX_UNLOCK(ctx);
 			break;
 		}
 		bits = if_getdrvflags(ifp);
 		/* stop the driver and free any clusters before proceeding */
 		iflib_stop(ctx);
 
 		if ((err = IFDI_MTU_SET(ctx, ifr->ifr_mtu)) == 0) {
 			if (ifr->ifr_mtu > ctx->ifc_max_fl_buf_size)
 				ctx->ifc_flags |= IFC_MULTISEG;
 			else
 				ctx->ifc_flags &= ~IFC_MULTISEG;
 			err = if_setmtu(ifp, ifr->ifr_mtu);
 		}
 		iflib_init_locked(ctx);
 		if_setdrvflags(ifp, bits);
 		CTX_UNLOCK(ctx);
 		break;
 	case SIOCSIFFLAGS:
 		CTX_LOCK(ctx);
 		if (if_getflags(ifp) & IFF_UP) {
 			if (if_getdrvflags(ifp) & IFF_DRV_RUNNING) {
 				if ((if_getflags(ifp) ^ ctx->ifc_if_flags) &
 				    (IFF_PROMISC | IFF_ALLMULTI)) {
 					err = IFDI_PROMISC_SET(ctx, if_getflags(ifp));
 				}
 			} else
 				reinit = 1;
 		} else if (if_getdrvflags(ifp) & IFF_DRV_RUNNING) {
 			iflib_stop(ctx);
 		}
 		ctx->ifc_if_flags = if_getflags(ifp);
 		CTX_UNLOCK(ctx);
 		break;
 
 		break;
 	case SIOCADDMULTI:
 	case SIOCDELMULTI:
 		if (if_getdrvflags(ifp) & IFF_DRV_RUNNING) {
 			CTX_LOCK(ctx);
 			IFDI_INTR_DISABLE(ctx);
 			IFDI_MULTI_SET(ctx);
 			IFDI_INTR_ENABLE(ctx);
 			CTX_UNLOCK(ctx);
 		}
 		break;
 	case SIOCSIFMEDIA:
 		CTX_LOCK(ctx);
 		IFDI_MEDIA_SET(ctx);
 		CTX_UNLOCK(ctx);
 		/* falls thru */
 	case SIOCGIFMEDIA:
 		err = ifmedia_ioctl(ifp, ifr, &ctx->ifc_media, command);
 		break;
 	case SIOCGI2C:
 	{
 		struct ifi2creq i2c;
 
 		err = copyin(ifr->ifr_data, &i2c, sizeof(i2c));
 		if (err != 0)
 			break;
 		if (i2c.dev_addr != 0xA0 && i2c.dev_addr != 0xA2) {
 			err = EINVAL;
 			break;
 		}
 		if (i2c.len > sizeof(i2c.data)) {
 			err = EINVAL;
 			break;
 		}
 
 		if ((err = IFDI_I2C_REQ(ctx, &i2c)) == 0)
 			err = copyout(&i2c, ifr->ifr_data, sizeof(i2c));
 		break;
 	}
 	case SIOCSIFCAP:
 	{
 		int mask, setmask;
 
 		mask = ifr->ifr_reqcap ^ if_getcapenable(ifp);
 		setmask = 0;
 #ifdef TCP_OFFLOAD
 		setmask |= mask & (IFCAP_TOE4|IFCAP_TOE6);
 #endif
 		setmask |= (mask & IFCAP_FLAGS);
 
 		if ((mask & IFCAP_WOL) &&
 		    (if_getcapabilities(ifp) & IFCAP_WOL) != 0)
 			setmask |= (mask & (IFCAP_WOL_MCAST|IFCAP_WOL_MAGIC));
 		if_vlancap(ifp);
 		/*
 		 * want to ensure that traffic has stopped before we change any of the flags
 		 */
 		if (setmask) {
 			CTX_LOCK(ctx);
 			bits = if_getdrvflags(ifp);
 			if (setmask & IFCAP_REINIT)
 				iflib_stop(ctx);
 			if_togglecapenable(ifp, setmask);
 			if (setmask & IFCAP_REINIT)
 				iflib_init_locked(ctx);
 			if_setdrvflags(ifp, bits);
 			CTX_UNLOCK(ctx);
 		}
 		break;
 	    }
 	case SIOCGPRIVATE_0:
 	case SIOCSDRVSPEC:
 	case SIOCGDRVSPEC:
 		CTX_LOCK(ctx);
 		err = IFDI_PRIV_IOCTL(ctx, command, data);
 		CTX_UNLOCK(ctx);
 		break;
 	default:
 		err = ether_ioctl(ifp, command, data);
 		break;
 	}
 	if (reinit)
 		iflib_if_init(ctx);
 	return (err);
 }
 
 static uint64_t
 iflib_if_get_counter(if_t ifp, ift_counter cnt)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 
 	return (IFDI_GET_COUNTER(ctx, cnt));
 }
 
 /*********************************************************************
  *
  *  OTHER FUNCTIONS EXPORTED TO THE STACK
  *
  **********************************************************************/
 
 static void
 iflib_vlan_register(void *arg, if_t ifp, uint16_t vtag)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 
 	if ((void *)ctx != arg)
 		return;
 
 	if ((vtag == 0) || (vtag > 4095))
 		return;
 
 	CTX_LOCK(ctx);
 	IFDI_VLAN_REGISTER(ctx, vtag);
 	/* Re-init to load the changes */
 	if (if_getcapenable(ifp) & IFCAP_VLAN_HWFILTER)
 		iflib_init_locked(ctx);
 	CTX_UNLOCK(ctx);
 }
 
 static void
 iflib_vlan_unregister(void *arg, if_t ifp, uint16_t vtag)
 {
 	if_ctx_t ctx = if_getsoftc(ifp);
 
 	if ((void *)ctx != arg)
 		return;
 
 	if ((vtag == 0) || (vtag > 4095))
 		return;
 
 	CTX_LOCK(ctx);
 	IFDI_VLAN_UNREGISTER(ctx, vtag);
 	/* Re-init to load the changes */
 	if (if_getcapenable(ifp) & IFCAP_VLAN_HWFILTER)
 		iflib_init_locked(ctx);
 	CTX_UNLOCK(ctx);
 }
 
 static void
 iflib_led_func(void *arg, int onoff)
 {
 	if_ctx_t ctx = arg;
 
 	CTX_LOCK(ctx);
 	IFDI_LED_FUNC(ctx, onoff);
 	CTX_UNLOCK(ctx);
 }
 
 /*********************************************************************
  *
  *  BUS FUNCTION DEFINITIONS
  *
  **********************************************************************/
 
 int
 iflib_device_probe(device_t dev)
 {
 	pci_vendor_info_t *ent;
 
 	uint16_t	pci_vendor_id, pci_device_id;
 	uint16_t	pci_subvendor_id, pci_subdevice_id;
 	uint16_t	pci_rev_id;
 	if_shared_ctx_t sctx;
 
 	if ((sctx = DEVICE_REGISTER(dev)) == NULL || sctx->isc_magic != IFLIB_MAGIC)
 		return (ENOTSUP);
 
 	pci_vendor_id = pci_get_vendor(dev);
 	pci_device_id = pci_get_device(dev);
 	pci_subvendor_id = pci_get_subvendor(dev);
 	pci_subdevice_id = pci_get_subdevice(dev);
 	pci_rev_id = pci_get_revid(dev);
 	if (sctx->isc_parse_devinfo != NULL)
 		sctx->isc_parse_devinfo(&pci_device_id, &pci_subvendor_id, &pci_subdevice_id, &pci_rev_id);
 
 	ent = sctx->isc_vendor_info;
 	while (ent->pvi_vendor_id != 0) {
 		if (pci_vendor_id != ent->pvi_vendor_id) {
 			ent++;
 			continue;
 		}
 		if ((pci_device_id == ent->pvi_device_id) &&
 		    ((pci_subvendor_id == ent->pvi_subvendor_id) ||
 		     (ent->pvi_subvendor_id == 0)) &&
 		    ((pci_subdevice_id == ent->pvi_subdevice_id) ||
 		     (ent->pvi_subdevice_id == 0)) &&
 		    ((pci_rev_id == ent->pvi_rev_id) ||
 		     (ent->pvi_rev_id == 0))) {
 
 			device_set_desc_copy(dev, ent->pvi_name);
 			/* this needs to be changed to zero if the bus probing code
 			 * ever stops re-probing on best match because the sctx
 			 * may have its values over written by register calls
 			 * in subsequent probes
 			 */
 			return (BUS_PROBE_DEFAULT);
 		}
 		ent++;
 	}
 	return (ENXIO);
 }
 
 int
 iflib_device_register(device_t dev, void *sc, if_shared_ctx_t sctx, if_ctx_t *ctxp)
 {
 	int err, rid, msix, msix_bar;
 	if_ctx_t ctx;
 	if_t ifp;
 	if_softc_ctx_t scctx;
 
 
 	ctx = malloc(sizeof(* ctx), M_IFLIB, M_WAITOK|M_ZERO);
 
 	if (sc == NULL) {
 		sc = malloc(sctx->isc_driver->size, M_IFLIB, M_WAITOK|M_ZERO);
 		device_set_softc(dev, ctx);
 	}
 
 	ctx->ifc_sctx = sctx;
 	ctx->ifc_dev = dev;
 	ctx->ifc_txrx = *sctx->isc_txrx;
 	ctx->ifc_softc = sc;
 
 	if ((err = iflib_register(ctx)) != 0) {
 		device_printf(dev, "iflib_register failed %d\n", err);
 		return (err);
 	}
 	iflib_add_device_sysctl_pre(ctx);
 	if ((err = IFDI_ATTACH_PRE(ctx)) != 0) {
 		device_printf(dev, "IFDI_ATTACH_PRE failed %d\n", err);
 		return (err);
 	}
 #ifdef ACPI_DMAR
 	if (dmar_get_dma_tag(device_get_parent(dev), dev) != NULL)
 		ctx->ifc_flags |= IFC_DMAR;
 #endif
 
 	scctx = &ctx->ifc_softc_ctx;
 	msix_bar = scctx->isc_msix_bar;
 
 	if (scctx->isc_tx_nsegments > sctx->isc_ntxd / MAX_SINGLE_PACKET_FRACTION)
 		scctx->isc_tx_nsegments = max(1, sctx->isc_ntxd / MAX_SINGLE_PACKET_FRACTION);
 	if (scctx->isc_tx_tso_segments_max > sctx->isc_ntxd / MAX_SINGLE_PACKET_FRACTION)
 		scctx->isc_tx_tso_segments_max = max(1, sctx->isc_ntxd / MAX_SINGLE_PACKET_FRACTION);
 
 	ifp = ctx->ifc_ifp;
 
 	/*
 	 * XXX sanity check that ntxd & nrxd are a power of 2
 	 */
 
 	/*
 	 * Protect the stack against modern hardware
 	 */
 	if (scctx->isc_tx_tso_size_max > FREEBSD_TSO_SIZE_MAX)
 		scctx->isc_tx_tso_size_max = FREEBSD_TSO_SIZE_MAX;
 
 	/* TSO parameters - dig these out of the data sheet - simply correspond to tag setup */
 	ifp->if_hw_tsomaxsegcount = scctx->isc_tx_tso_segments_max;
 	ifp->if_hw_tsomax = scctx->isc_tx_tso_size_max;
 	ifp->if_hw_tsomaxsegsize = scctx->isc_tx_tso_segsize_max;
 	if (scctx->isc_rss_table_size == 0)
 		scctx->isc_rss_table_size = 64;
 	scctx->isc_rss_table_mask = scctx->isc_rss_table_size-1;;
 	/*
 	** Now setup MSI or MSI/X, should
 	** return us the number of supported
 	** vectors. (Will be 1 for MSI)
 	*/
 	if (sctx->isc_flags & IFLIB_SKIP_MSIX) {
 		msix = scctx->isc_vectors;
 	} else if (scctx->isc_msix_bar != 0)
 		msix = iflib_msix_init(ctx);
 	else {
 		scctx->isc_vectors = 1;
 		scctx->isc_ntxqsets = 1;
 		scctx->isc_nrxqsets = 1;
 		scctx->isc_intr = IFLIB_INTR_LEGACY;
 		msix = 0;
 	}
 	/* Get memory for the station queues */
 	if ((err = iflib_queues_alloc(ctx))) {
 		device_printf(dev, "Unable to allocate queue memory\n");
 		goto fail;
 	}
 
 	if ((err = iflib_qset_structures_setup(ctx))) {
 		device_printf(dev, "qset structure setup failed %d\n", err);
 		goto fail_queues;
 	}
 
 	if (msix > 1 && (err = IFDI_MSIX_INTR_ASSIGN(ctx, msix)) != 0) {
 		device_printf(dev, "IFDI_MSIX_INTR_ASSIGN failed %d\n", err);
 		goto fail_intr_free;
 	}
 	if (msix <= 1) {
 		rid = 0;
 		if (scctx->isc_intr == IFLIB_INTR_MSI) {
 			MPASS(msix == 1);
 			rid = 1;
 		}
 		if ((err = iflib_legacy_setup(ctx, ctx->isc_legacy_intr, ctx, &rid, "irq0")) != 0) {
 			device_printf(dev, "iflib_legacy_setup failed %d\n", err);
 			goto fail_intr_free;
 		}
 	}
 	ether_ifattach(ctx->ifc_ifp, ctx->ifc_mac);
 	if ((err = IFDI_ATTACH_POST(ctx)) != 0) {
 		device_printf(dev, "IFDI_ATTACH_POST failed %d\n", err);
 		goto fail_detach;
 	}
 	if ((err = iflib_netmap_attach(ctx))) {
 		device_printf(ctx->ifc_dev, "netmap attach failed: %d\n", err);
 		goto fail_detach;
 	}
 	*ctxp = ctx;
 
 	iflib_add_device_sysctl_post(ctx);
 	return (0);
 fail_detach:
 	ether_ifdetach(ctx->ifc_ifp);
 fail_intr_free:
 	if (scctx->isc_intr == IFLIB_INTR_MSIX || scctx->isc_intr == IFLIB_INTR_MSI)
 		pci_release_msi(ctx->ifc_dev);
 fail_queues:
 	/* XXX free queues */
 fail:
 	IFDI_DETACH(ctx);
 	return (err);
 }
 
 int
 iflib_device_attach(device_t dev)
 {
 	if_ctx_t ctx;
 	if_shared_ctx_t sctx;
 
 	if ((sctx = DEVICE_REGISTER(dev)) == NULL || sctx->isc_magic != IFLIB_MAGIC)
 		return (ENOTSUP);
 
 	pci_enable_busmaster(dev);
 
 	return (iflib_device_register(dev, NULL, sctx, &ctx));
 }
 
 int
 iflib_device_deregister(if_ctx_t ctx)
 {
 	if_t ifp = ctx->ifc_ifp;
 	iflib_txq_t txq;
 	iflib_rxq_t rxq;
 	device_t dev = ctx->ifc_dev;
 	int i;
 	struct taskqgroup *tqg;
 
 	/* Make sure VLANS are not using driver */
 	if (if_vlantrunkinuse(ifp)) {
 		device_printf(dev,"Vlan in use, detach first\n");
 		return (EBUSY);
 	}
 
 	CTX_LOCK(ctx);
 	ctx->ifc_in_detach = 1;
 	iflib_stop(ctx);
 	CTX_UNLOCK(ctx);
 
 	/* Unregister VLAN events */
 	if (ctx->ifc_vlan_attach_event != NULL)
 		EVENTHANDLER_DEREGISTER(vlan_config, ctx->ifc_vlan_attach_event);
 	if (ctx->ifc_vlan_detach_event != NULL)
 		EVENTHANDLER_DEREGISTER(vlan_unconfig, ctx->ifc_vlan_detach_event);
 
 	iflib_netmap_detach(ifp);
 	ether_ifdetach(ifp);
 	/* ether_ifdetach calls if_qflush - lock must be destroy afterwards*/
 	CTX_LOCK_DESTROY(ctx);
 	if (ctx->ifc_led_dev != NULL)
 		led_destroy(ctx->ifc_led_dev);
 	/* XXX drain any dependent tasks */
 	tqg = qgroup_if_io_tqg;
 	for (txq = ctx->ifc_txqs, i = 0, rxq = ctx->ifc_rxqs; i < NTXQSETS(ctx); i++, txq++) {
 		callout_drain(&txq->ift_timer);
 		callout_drain(&txq->ift_db_check);
 		if (txq->ift_task.gt_uniq != NULL)
 			taskqgroup_detach(tqg, &txq->ift_task);
 	}
 	for (i = 0, rxq = ctx->ifc_rxqs; i < NRXQSETS(ctx); i++, rxq++) {
 		if (rxq->ifr_task.gt_uniq != NULL)
 			taskqgroup_detach(tqg, &rxq->ifr_task);
 	}
 	tqg = qgroup_if_config_tqg;
 	if (ctx->ifc_admin_task.gt_uniq != NULL)
 		taskqgroup_detach(tqg, &ctx->ifc_admin_task);
 	if (ctx->ifc_vflr_task.gt_uniq != NULL)
 		taskqgroup_detach(tqg, &ctx->ifc_vflr_task);
 
 	IFDI_DETACH(ctx);
 	if (ctx->ifc_softc_ctx.isc_intr != IFLIB_INTR_LEGACY) {
 		pci_release_msi(dev);
 	}
 	if (ctx->ifc_softc_ctx.isc_intr != IFLIB_INTR_MSIX) {
 		iflib_irq_free(ctx, &ctx->ifc_legacy_irq);
 	}
 	if (ctx->ifc_msix_mem != NULL) {
 		bus_release_resource(ctx->ifc_dev, SYS_RES_MEMORY,
 			ctx->ifc_softc_ctx.isc_msix_bar, ctx->ifc_msix_mem);
 		ctx->ifc_msix_mem = NULL;
 	}
 
 	bus_generic_detach(dev);
 	if_free(ifp);
 
 	iflib_tx_structures_free(ctx);
 	iflib_rx_structures_free(ctx);
 	return (0);
 }
 
 
 int
 iflib_device_detach(device_t dev)
 {
 	if_ctx_t ctx = device_get_softc(dev);
 
 	return (iflib_device_deregister(ctx));
 }
 
 int
 iflib_device_suspend(device_t dev)
 {
 	if_ctx_t ctx = device_get_softc(dev);
 
 	CTX_LOCK(ctx);
 	IFDI_SUSPEND(ctx);
 	CTX_UNLOCK(ctx);
 
 	return bus_generic_suspend(dev);
 }
 int
 iflib_device_shutdown(device_t dev)
 {
 	if_ctx_t ctx = device_get_softc(dev);
 
 	CTX_LOCK(ctx);
 	IFDI_SHUTDOWN(ctx);
 	CTX_UNLOCK(ctx);
 
 	return bus_generic_suspend(dev);
 }
 
 
 int
 iflib_device_resume(device_t dev)
 {
 	if_ctx_t ctx = device_get_softc(dev);
 	iflib_txq_t txq = ctx->ifc_txqs;
 
 	CTX_LOCK(ctx);
 	IFDI_RESUME(ctx);
 	iflib_init_locked(ctx);
 	CTX_UNLOCK(ctx);
 	for (int i = 0; i < NTXQSETS(ctx); i++, txq++)
 		iflib_txq_check_drain(txq, IFLIB_RESTART_BUDGET);
 
 	return (bus_generic_resume(dev));
 }
 
 int
 iflib_device_iov_init(device_t dev, uint16_t num_vfs, const nvlist_t *params)
 {
 	int error;
 	if_ctx_t ctx = device_get_softc(dev);
 
 	CTX_LOCK(ctx);
 	error = IFDI_IOV_INIT(ctx, num_vfs, params);
 	CTX_UNLOCK(ctx);
 
 	return (error);
 }
 
 void
 iflib_device_iov_uninit(device_t dev)
 {
 	if_ctx_t ctx = device_get_softc(dev);
 
 	CTX_LOCK(ctx);
 	IFDI_IOV_UNINIT(ctx);
 	CTX_UNLOCK(ctx);
 }
 
 int
 iflib_device_iov_add_vf(device_t dev, uint16_t vfnum, const nvlist_t *params)
 {
 	int error;
 	if_ctx_t ctx = device_get_softc(dev);
 
 	CTX_LOCK(ctx);
 	error = IFDI_IOV_VF_ADD(ctx, vfnum, params);
 	CTX_UNLOCK(ctx);
 
 	return (error);
 }
 
 /*********************************************************************
  *
  *  MODULE FUNCTION DEFINITIONS
  *
  **********************************************************************/
 
 /*
  * - Start a fast taskqueue thread for each core
  * - Start a taskqueue for control operations
  */
 static int
 iflib_module_init(void)
 {
 	return (0);
 }
 
 static int
 iflib_module_event_handler(module_t mod, int what, void *arg)
 {
 	int err;
 
 	switch (what) {
 	case MOD_LOAD:
 		if ((err = iflib_module_init()) != 0)
 			return (err);
 		break;
 	case MOD_UNLOAD:
 		return (EBUSY);
 	default:
 		return (EOPNOTSUPP);
 	}
 
 	return (0);
 }
 
 /*********************************************************************
  *
  *  PUBLIC FUNCTION DEFINITIONS
  *     ordered as in iflib.h
  *
  **********************************************************************/
 
 
 static void
 _iflib_assert(if_shared_ctx_t sctx)
 {
 	MPASS(sctx->isc_tx_maxsize);
 	MPASS(sctx->isc_tx_maxsegsize);
 
 	MPASS(sctx->isc_rx_maxsize);
 	MPASS(sctx->isc_rx_nsegments);
 	MPASS(sctx->isc_rx_maxsegsize);
 
 
 	MPASS(sctx->isc_txrx->ift_txd_encap);
 	MPASS(sctx->isc_txrx->ift_txd_flush);
 	MPASS(sctx->isc_txrx->ift_txd_credits_update);
 	MPASS(sctx->isc_txrx->ift_rxd_available);
 	MPASS(sctx->isc_txrx->ift_rxd_pkt_get);
 	MPASS(sctx->isc_txrx->ift_rxd_refill);
 	MPASS(sctx->isc_txrx->ift_rxd_flush);
 	MPASS(sctx->isc_nrxd);
 }
 
 static int
 iflib_register(if_ctx_t ctx)
 {
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	driver_t *driver = sctx->isc_driver;
 	device_t dev = ctx->ifc_dev;
 	if_t ifp;
 
 	_iflib_assert(sctx);
 
 	CTX_LOCK_INIT(ctx, device_get_nameunit(ctx->ifc_dev));
 	MPASS(ctx->ifc_flags == 0);
 
 	ifp = ctx->ifc_ifp = if_gethandle(IFT_ETHER);
 	if (ifp == NULL) {
 		device_printf(dev, "can not allocate ifnet structure\n");
 		return (ENOMEM);
 	}
 
 	/*
 	 * Initialize our context's device specific methods
 	 */
 	kobj_init((kobj_t) ctx, (kobj_class_t) driver);
 	kobj_class_compile((kobj_class_t) driver);
 	driver->refs++;
 
 	if_initname(ifp, device_get_name(dev), device_get_unit(dev));
 	if_setsoftc(ifp, ctx);
 	if_setdev(ifp, dev);
 	if_setinitfn(ifp, iflib_if_init);
 	if_setioctlfn(ifp, iflib_if_ioctl);
 	if_settransmitfn(ifp, iflib_if_transmit);
 	if_setqflushfn(ifp, iflib_if_qflush);
 	if_setgetcounterfn(ifp, iflib_if_get_counter);
 	if_setflags(ifp, IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST);
 
 	if_setcapabilities(ifp, 0);
 	if_setcapenable(ifp, 0);
 
 	ctx->ifc_vlan_attach_event =
 		EVENTHANDLER_REGISTER(vlan_config, iflib_vlan_register, ctx,
 							  EVENTHANDLER_PRI_FIRST);
 	ctx->ifc_vlan_detach_event =
 		EVENTHANDLER_REGISTER(vlan_unconfig, iflib_vlan_unregister, ctx,
 							  EVENTHANDLER_PRI_FIRST);
 
 	ifmedia_init(&ctx->ifc_media, IFM_IMASK,
 					 iflib_media_change, iflib_media_status);
 
 	return (0);
 }
 
 
 static int
 iflib_queues_alloc(if_ctx_t ctx)
 {
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	device_t dev = ctx->ifc_dev;
 	int nrxqsets = ctx->ifc_softc_ctx.isc_nrxqsets;
 	int ntxqsets = ctx->ifc_softc_ctx.isc_ntxqsets;
 	iflib_txq_t txq;
 	iflib_rxq_t rxq;
 	iflib_fl_t fl = NULL;
 	int i, j, cpu, err, txconf, rxconf, fl_ifdi_offset;
 	iflib_dma_info_t ifdip;
 	uint32_t *rxqsizes = sctx->isc_rxqsizes;
 	uint32_t *txqsizes = sctx->isc_txqsizes;
 	uint8_t nrxqs = sctx->isc_nrxqs;
 	uint8_t ntxqs = sctx->isc_ntxqs;
 	int nfree_lists = sctx->isc_nfl ? sctx->isc_nfl : 1;
 	caddr_t *vaddrs;
 	uint64_t *paddrs;
 	struct ifmp_ring **brscp;
 	int nbuf_rings = 1; /* XXX determine dynamically */
 
 	KASSERT(ntxqs > 0, ("number of queues must be at least 1"));
 	KASSERT(nrxqs > 0, ("number of queues must be at least 1"));
 
 	brscp = NULL;
 	rxq = NULL;
 
 /* Allocate the TX ring struct memory */
 	if (!(txq =
 	    (iflib_txq_t) malloc(sizeof(struct iflib_txq) *
 	    ntxqsets, M_IFLIB, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate TX ring memory\n");
 		err = ENOMEM;
 		goto fail;
 	}
 
 	/* Now allocate the RX */
 	if (!(rxq =
 	    (iflib_rxq_t) malloc(sizeof(struct iflib_rxq) *
 	    nrxqsets, M_IFLIB, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to allocate RX ring memory\n");
 		err = ENOMEM;
 		goto rx_fail;
 	}
 	if (!(brscp = malloc(sizeof(void *) * nbuf_rings * nrxqsets, M_IFLIB, M_NOWAIT | M_ZERO))) {
 		device_printf(dev, "Unable to buf_ring_sc * memory\n");
 		err = ENOMEM;
 		goto rx_fail;
 	}
 
 	ctx->ifc_txqs = txq;
 	ctx->ifc_rxqs = rxq;
 	txq = NULL;
 	rxq = NULL;
 
 	/*
 	 * XXX handle allocation failure
 	 */
 	for (txconf = i = 0, cpu = CPU_FIRST(); i < ntxqsets; i++, txconf++, txq++, cpu = CPU_NEXT(cpu)) {
 		/* Set up some basics */
 
 		if ((ifdip = malloc(sizeof(struct iflib_dma_info) * ntxqs, M_IFLIB, M_WAITOK|M_ZERO)) == NULL) {
 			device_printf(dev, "failed to allocate iflib_dma_info\n");
 			err = ENOMEM;
 			goto err_tx_desc;
 		}
 		txq->ift_ifdi = ifdip;
 		for (j = 0; j < ntxqs; j++, ifdip++) {
 			if (iflib_dma_alloc(ctx, txqsizes[j], ifdip, BUS_DMA_NOWAIT)) {
 				device_printf(dev, "Unable to allocate Descriptor memory\n");
 				err = ENOMEM;
 				goto err_tx_desc;
 			}
 			bzero((void *)ifdip->idi_vaddr, txqsizes[j]);
 		}
 		txq->ift_ctx = ctx;
 		txq->ift_id = i;
 		/* XXX fix this */
 		txq->ift_timer.c_cpu = cpu;
 		txq->ift_db_check.c_cpu = cpu;
 		txq->ift_nbr = nbuf_rings;
 
 		if (iflib_txsd_alloc(txq)) {
 			device_printf(dev, "Critical Failure setting up TX buffers\n");
 			err = ENOMEM;
 			goto err_tx_desc;
 		}
 
 		/* Initialize the TX lock */
 		snprintf(txq->ift_mtx_name, MTX_NAME_LEN, "%s:tx(%d):callout",
 		    device_get_nameunit(dev), txq->ift_id);
 		mtx_init(&txq->ift_mtx, txq->ift_mtx_name, NULL, MTX_DEF);
 		callout_init_mtx(&txq->ift_timer, &txq->ift_mtx, 0);
 		callout_init_mtx(&txq->ift_db_check, &txq->ift_mtx, 0);
 
 		snprintf(txq->ift_db_mtx_name, MTX_NAME_LEN, "%s:tx(%d):db",
 			 device_get_nameunit(dev), txq->ift_id);
 		TXDB_LOCK_INIT(txq);
 
 		txq->ift_br = brscp + i*nbuf_rings;
 		for (j = 0; j < nbuf_rings; j++) {
 			err = ifmp_ring_alloc(&txq->ift_br[j], 2048, txq, iflib_txq_drain,
 					      iflib_txq_can_drain, M_IFLIB, M_WAITOK);
 			if (err) {
 				/* XXX free any allocated rings */
 				device_printf(dev, "Unable to allocate buf_ring\n");
 				goto err_tx_desc;
 			}
 		}
 	}
 
 	for (rxconf = i = 0; i < nrxqsets; i++, rxconf++, rxq++) {
 		/* Set up some basics */
 
 		if ((ifdip = malloc(sizeof(struct iflib_dma_info) * nrxqs, M_IFLIB, M_WAITOK|M_ZERO)) == NULL) {
 			device_printf(dev, "failed to allocate iflib_dma_info\n");
 			err = ENOMEM;
 			goto err_tx_desc;
 		}
 
 		rxq->ifr_ifdi = ifdip;
 		for (j = 0; j < nrxqs; j++, ifdip++) {
 			if (iflib_dma_alloc(ctx, rxqsizes[j], ifdip, BUS_DMA_NOWAIT)) {
 				device_printf(dev, "Unable to allocate Descriptor memory\n");
 				err = ENOMEM;
 				goto err_tx_desc;
 			}
 			bzero((void *)ifdip->idi_vaddr, rxqsizes[j]);
 		}
 		rxq->ifr_ctx = ctx;
 		rxq->ifr_id = i;
 		if (sctx->isc_flags & IFLIB_HAS_CQ) {
 			fl_ifdi_offset = 1;
 		} else {
 			fl_ifdi_offset = 0;
 		}
 		rxq->ifr_nfl = nfree_lists;
 		if (!(fl =
 			  (iflib_fl_t) malloc(sizeof(struct iflib_fl) * nfree_lists, M_IFLIB, M_NOWAIT | M_ZERO))) {
 			device_printf(dev, "Unable to allocate free list memory\n");
 			err = ENOMEM;
 			goto err_tx_desc;
 		}
 		rxq->ifr_fl = fl;
 		for (j = 0; j < nfree_lists; j++) {
 			rxq->ifr_fl[j].ifl_rxq = rxq;
 			rxq->ifr_fl[j].ifl_id = j;
 			rxq->ifr_fl[j].ifl_ifdi = &rxq->ifr_ifdi[j + fl_ifdi_offset];
 		}
         /* Allocate receive buffers for the ring*/
 		if (iflib_rxsd_alloc(rxq)) {
 			device_printf(dev,
 			    "Critical Failure setting up receive buffers\n");
 			err = ENOMEM;
 			goto err_rx_desc;
 		}
 	}
 
 	/* TXQs */
 	vaddrs = malloc(sizeof(caddr_t)*ntxqsets*ntxqs, M_IFLIB, M_WAITOK);
 	paddrs = malloc(sizeof(uint64_t)*ntxqsets*ntxqs, M_IFLIB, M_WAITOK);
 	for (i = 0; i < ntxqsets; i++) {
 		iflib_dma_info_t di = ctx->ifc_txqs[i].ift_ifdi;
 
 		for (j = 0; j < ntxqs; j++, di++) {
 			vaddrs[i*ntxqs + j] = di->idi_vaddr;
 			paddrs[i*ntxqs + j] = di->idi_paddr;
 		}
 	}
 	if ((err = IFDI_TX_QUEUES_ALLOC(ctx, vaddrs, paddrs, ntxqs, ntxqsets)) != 0) {
 		device_printf(ctx->ifc_dev, "device queue allocation failed\n");
 		iflib_tx_structures_free(ctx);
 		free(vaddrs, M_IFLIB);
 		free(paddrs, M_IFLIB);
 		goto err_rx_desc;
 	}
 	free(vaddrs, M_IFLIB);
 	free(paddrs, M_IFLIB);
 
 	/* RXQs */
 	vaddrs = malloc(sizeof(caddr_t)*nrxqsets*nrxqs, M_IFLIB, M_WAITOK);
 	paddrs = malloc(sizeof(uint64_t)*nrxqsets*nrxqs, M_IFLIB, M_WAITOK);
 	for (i = 0; i < nrxqsets; i++) {
 		iflib_dma_info_t di = ctx->ifc_rxqs[i].ifr_ifdi;
 
 		for (j = 0; j < nrxqs; j++, di++) {
 			vaddrs[i*nrxqs + j] = di->idi_vaddr;
 			paddrs[i*nrxqs + j] = di->idi_paddr;
 		}
 	}
 	if ((err = IFDI_RX_QUEUES_ALLOC(ctx, vaddrs, paddrs, nrxqs, nrxqsets)) != 0) {
 		device_printf(ctx->ifc_dev, "device queue allocation failed\n");
 		iflib_tx_structures_free(ctx);
 		free(vaddrs, M_IFLIB);
 		free(paddrs, M_IFLIB);
 		goto err_rx_desc;
 	}
 	free(vaddrs, M_IFLIB);
 	free(paddrs, M_IFLIB);
 
 	return (0);
 
 /* XXX handle allocation failure changes */
 err_rx_desc:
 err_tx_desc:
 	if (ctx->ifc_rxqs != NULL)
 		free(ctx->ifc_rxqs, M_IFLIB);
 	ctx->ifc_rxqs = NULL;
 	if (ctx->ifc_txqs != NULL)
 		free(ctx->ifc_txqs, M_IFLIB);
 	ctx->ifc_txqs = NULL;
 rx_fail:
 	if (brscp != NULL)
 		free(brscp, M_IFLIB);
 	if (rxq != NULL)
 		free(rxq, M_IFLIB);
 	if (txq != NULL)
 		free(txq, M_IFLIB);
 fail:
 	return (err);
 }
 
 static int
 iflib_tx_structures_setup(if_ctx_t ctx)
 {
 	iflib_txq_t txq = ctx->ifc_txqs;
 	int i;
 
 	for (i = 0; i < NTXQSETS(ctx); i++, txq++)
 		iflib_txq_setup(txq);
 
 	return (0);
 }
 
 static void
 iflib_tx_structures_free(if_ctx_t ctx)
 {
 	iflib_txq_t txq = ctx->ifc_txqs;
 	int i, j;
 
 	for (i = 0; i < NTXQSETS(ctx); i++, txq++) {
 		iflib_txq_destroy(txq);
 		for (j = 0; j < ctx->ifc_nhwtxqs; j++)
 			iflib_dma_free(&txq->ift_ifdi[j]);
 	}
 	free(ctx->ifc_txqs, M_IFLIB);
 	ctx->ifc_txqs = NULL;
 	IFDI_QUEUES_FREE(ctx);
 }
 
 /*********************************************************************
  *
  *  Initialize all receive rings.
  *
  **********************************************************************/
 static int
 iflib_rx_structures_setup(if_ctx_t ctx)
 {
 	iflib_rxq_t rxq = ctx->ifc_rxqs;
 	int q;
 #if defined(INET6) || defined(INET)
 	int i, err;
 #endif
 
 	for (q = 0; q < ctx->ifc_softc_ctx.isc_nrxqsets; q++, rxq++) {
 #if defined(INET6) || defined(INET)
 		tcp_lro_free(&rxq->ifr_lc);
 		if ((err = tcp_lro_init(&rxq->ifr_lc)) != 0) {
 			device_printf(ctx->ifc_dev, "LRO Initialization failed!\n");
 			goto fail;
 		}
 		rxq->ifr_lro_enabled = TRUE;
 		rxq->ifr_lc.ifp = ctx->ifc_ifp;
 #endif
 		IFDI_RXQ_SETUP(ctx, rxq->ifr_id);
 	}
 	return (0);
 #if defined(INET6) || defined(INET)
 fail:
 	/*
 	 * Free RX software descriptors allocated so far, we will only handle
 	 * the rings that completed, the failing case will have
 	 * cleaned up for itself. 'q' failed, so its the terminus.
 	 */
 	rxq = ctx->ifc_rxqs;
 	for (i = 0; i < q; ++i, rxq++) {
 		iflib_rx_sds_free(rxq);
 		rxq->ifr_cq_gen = rxq->ifr_cq_cidx = rxq->ifr_cq_pidx = 0;
 	}
 	return (err);
 #endif
 }
 
 /*********************************************************************
  *
  *  Free all receive rings.
  *
  **********************************************************************/
 static void
 iflib_rx_structures_free(if_ctx_t ctx)
 {
 	iflib_rxq_t rxq = ctx->ifc_rxqs;
 
 	for (int i = 0; i < ctx->ifc_softc_ctx.isc_ntxqsets; i++, rxq++) {
 		iflib_rx_sds_free(rxq);
 	}
 }
 
 static int
 iflib_qset_structures_setup(if_ctx_t ctx)
 {
 	int err;
 
 	if ((err = iflib_tx_structures_setup(ctx)) != 0)
 		return (err);
 
 	if ((err = iflib_rx_structures_setup(ctx)) != 0) {
 		device_printf(ctx->ifc_dev, "iflib_rx_structures_setup failed: %d\n", err);
 		iflib_tx_structures_free(ctx);
 		iflib_rx_structures_free(ctx);
 	}
 	return (err);
 }
 
 int
 iflib_irq_alloc(if_ctx_t ctx, if_irq_t irq, int rid,
 				driver_filter_t filter, void *filter_arg, driver_intr_t handler, void *arg, char *name)
 {
 
 	return (_iflib_irq_alloc(ctx, irq, rid, filter, handler, arg, name));
 }
 
 static void
 find_nth(if_ctx_t ctx, cpuset_t *cpus, int qid)
 {
 	int i, cpuid;
 
 	CPU_COPY(&ctx->ifc_cpus, cpus);
 	/* clear up to the qid'th bit */
 	for (i = 0; i < qid; i++) {
 		cpuid = CPU_FFS(cpus);
 		CPU_CLR(cpuid, cpus);
 	}
 }
 
 int
 iflib_irq_alloc_generic(if_ctx_t ctx, if_irq_t irq, int rid,
 						iflib_intr_type_t type, driver_filter_t *filter,
 						void *filter_arg, int qid, char *name)
 {
 	struct grouptask *gtask;
 	struct taskqgroup *tqg;
 	iflib_filter_info_t info;
 	cpuset_t cpus;
 	task_fn_t *fn;
 	int tqrid, err;
 	void *q;
 
 	info = &ctx->ifc_filter_info;
 
 	switch (type) {
 	/* XXX merge tx/rx for netmap? */
 	case IFLIB_INTR_TX:
 		q = &ctx->ifc_txqs[qid];
 		info = &ctx->ifc_txqs[qid].ift_filter_info;
 		gtask = &ctx->ifc_txqs[qid].ift_task;
 		tqg = qgroup_if_io_tqg;
 		tqrid = irq->ii_rid;
 		fn = _task_fn_tx;
 		break;
 	case IFLIB_INTR_RX:
 		q = &ctx->ifc_rxqs[qid];
 		info = &ctx->ifc_rxqs[qid].ifr_filter_info;
 		gtask = &ctx->ifc_rxqs[qid].ifr_task;
 		tqg = qgroup_if_io_tqg;
 		tqrid = irq->ii_rid;
 		fn = _task_fn_rx;
 		break;
 	case IFLIB_INTR_ADMIN:
 		q = ctx;
 		info = &ctx->ifc_filter_info;
 		gtask = &ctx->ifc_admin_task;
 		tqg = qgroup_if_config_tqg;
 		tqrid = -1;
 		fn = _task_fn_admin;
 		break;
 	default:
 		panic("unknown net intr type");
 	}
 	GROUPTASK_INIT(gtask, 0, fn, q);
 
 	info->ifi_filter = filter;
 	info->ifi_filter_arg = filter_arg;
 	info->ifi_task = gtask;
 
 	/* XXX query cpu that rid belongs to */
 
 	err = _iflib_irq_alloc(ctx, irq, rid, iflib_fast_intr, NULL, info,  name);
 	if (err != 0)
 		return (err);
 	if (tqrid != -1) {
 		find_nth(ctx, &cpus, qid);
 		taskqgroup_attach_cpu(tqg, gtask, q, CPU_FFS(&cpus), irq->ii_rid, name);
 	} else
 		taskqgroup_attach(tqg, gtask, q, tqrid, name);
 
 
 	return (0);
 }
 
 void
 iflib_softirq_alloc_generic(if_ctx_t ctx, int rid, iflib_intr_type_t type,  void *arg, int qid, char *name)
 {
 	struct grouptask *gtask;
 	struct taskqgroup *tqg;
 	task_fn_t *fn;
 	void *q;
 
 	switch (type) {
 	case IFLIB_INTR_TX:
 		q = &ctx->ifc_txqs[qid];
 		gtask = &ctx->ifc_txqs[qid].ift_task;
 		tqg = qgroup_if_io_tqg;
 		fn = _task_fn_tx;
 		break;
 	case IFLIB_INTR_RX:
 		q = &ctx->ifc_rxqs[qid];
 		gtask = &ctx->ifc_rxqs[qid].ifr_task;
 		tqg = qgroup_if_io_tqg;
 		fn = _task_fn_rx;
 		break;
 	case IFLIB_INTR_ADMIN:
 		q = ctx;
 		gtask = &ctx->ifc_admin_task;
 		tqg = qgroup_if_config_tqg;
 		rid = -1;
 		fn = _task_fn_admin;
 		break;
 	case IFLIB_INTR_IOV:
 		q = ctx;
 		gtask = &ctx->ifc_vflr_task;
 		tqg = qgroup_if_config_tqg;
 		rid = -1;
 		fn = _task_fn_iov;
 		break;
 	default:
 		panic("unknown net intr type");
 	}
 	GROUPTASK_INIT(gtask, 0, fn, q);
 	taskqgroup_attach(tqg, gtask, q, rid, name);
 }
 
 void
 iflib_irq_free(if_ctx_t ctx, if_irq_t irq)
 {
 	if (irq->ii_tag)
 		bus_teardown_intr(ctx->ifc_dev, irq->ii_res, irq->ii_tag);
 
 	if (irq->ii_res)
 		bus_release_resource(ctx->ifc_dev, SYS_RES_IRQ, irq->ii_rid, irq->ii_res);
 }
 
 static int
 iflib_legacy_setup(if_ctx_t ctx, driver_filter_t filter, void *filter_arg, int *rid, char *name)
 {
 	iflib_txq_t txq = ctx->ifc_txqs;
 	iflib_rxq_t rxq = ctx->ifc_rxqs;
 	if_irq_t irq = &ctx->ifc_legacy_irq;
 	iflib_filter_info_t info;
 	struct grouptask *gtask;
 	struct taskqgroup *tqg;
 	task_fn_t *fn;
 	int tqrid;
 	void *q;
 	int err;
 
 	q = &ctx->ifc_rxqs[0];
 	info = &rxq[0].ifr_filter_info;
 	gtask = &rxq[0].ifr_task;
 	tqg = qgroup_if_io_tqg;
 	tqrid = irq->ii_rid = *rid;
 	fn = _task_fn_rx;
 
 	ctx->ifc_flags |= IFC_LEGACY;
 	info->ifi_filter = filter;
 	info->ifi_filter_arg = filter_arg;
 	info->ifi_task = gtask;
 
 	/* We allocate a single interrupt resource */
 	if ((err = _iflib_irq_alloc(ctx, irq, tqrid, iflib_fast_intr, NULL, info, name)) != 0)
 		return (err);
 	GROUPTASK_INIT(gtask, 0, fn, q);
 	taskqgroup_attach(tqg, gtask, q, tqrid, name);
 
 	GROUPTASK_INIT(&txq->ift_task, 0, _task_fn_tx, txq);
 	taskqgroup_attach(qgroup_if_io_tqg, &txq->ift_task, txq, tqrid, "tx");
 	GROUPTASK_INIT(&ctx->ifc_admin_task, 0, _task_fn_admin, ctx);
 	taskqgroup_attach(qgroup_if_config_tqg, &ctx->ifc_admin_task, ctx, -1, "admin/link");
 
 	return (0);
 }
 
 void
 iflib_led_create(if_ctx_t ctx)
 {
 
 	ctx->ifc_led_dev = led_create(iflib_led_func, ctx,
 								  device_get_nameunit(ctx->ifc_dev));
 }
 
 void
 iflib_tx_intr_deferred(if_ctx_t ctx, int txqid)
 {
 
 	GROUPTASK_ENQUEUE(&ctx->ifc_txqs[txqid].ift_task);
 }
 
 void
 iflib_rx_intr_deferred(if_ctx_t ctx, int rxqid)
 {
 
 	GROUPTASK_ENQUEUE(&ctx->ifc_rxqs[rxqid].ifr_task);
 }
 
 void
 iflib_admin_intr_deferred(if_ctx_t ctx)
 {
 
 	GROUPTASK_ENQUEUE(&ctx->ifc_admin_task);
 }
 
 void
 iflib_iov_intr_deferred(if_ctx_t ctx)
 {
 
 	GROUPTASK_ENQUEUE(&ctx->ifc_vflr_task);
 }
 
 void
 iflib_io_tqg_attach(struct grouptask *gt, void *uniq, int cpu, char *name)
 {
 
 	taskqgroup_attach_cpu(qgroup_if_io_tqg, gt, uniq, cpu, -1, name);
 }
 
 void
 iflib_config_gtask_init(if_ctx_t ctx, struct grouptask *gtask, task_fn_t *fn,
 	char *name)
 {
 
 	GROUPTASK_INIT(gtask, 0, fn, ctx);
 	taskqgroup_attach(qgroup_if_config_tqg, gtask, gtask, -1, name);
 }
 
 void
 iflib_link_state_change(if_ctx_t ctx, int link_state)
 {
 	if_t ifp = ctx->ifc_ifp;
 	iflib_txq_t txq = ctx->ifc_txqs;
 
 #if 0
 	if_setbaudrate(ifp, baudrate);
 #endif
 	/* If link down, disable watchdog */
 	if ((ctx->ifc_link_state == LINK_STATE_UP) && (link_state == LINK_STATE_DOWN)) {
 		for (int i = 0; i < ctx->ifc_softc_ctx.isc_ntxqsets; i++, txq++)
 			txq->ift_qstatus = IFLIB_QUEUE_IDLE;
 	}
 	ctx->ifc_link_state = link_state;
 	if_link_state_change(ifp, link_state);
 }
 
 static int
 iflib_tx_credits_update(if_ctx_t ctx, iflib_txq_t txq)
 {
 	int credits;
 
 	if (ctx->isc_txd_credits_update == NULL)
 		return (0);
 
 	if ((credits = ctx->isc_txd_credits_update(ctx->ifc_softc, txq->ift_id, txq->ift_cidx_processed, true)) == 0)
 		return (0);
 
 	txq->ift_processed += credits;
 	txq->ift_cidx_processed += credits;
 
 	if (txq->ift_cidx_processed >= txq->ift_size)
 		txq->ift_cidx_processed -= txq->ift_size;
 	return (credits);
 }
 
 static int
 iflib_rxd_avail(if_ctx_t ctx, iflib_rxq_t rxq, int cidx)
 {
 
 	return (ctx->isc_rxd_available(ctx->ifc_softc, rxq->ifr_id, cidx));
 }
 
 void
 iflib_add_int_delay_sysctl(if_ctx_t ctx, const char *name,
 	const char *description, if_int_delay_info_t info,
 	int offset, int value)
 {
 	info->iidi_ctx = ctx;
 	info->iidi_offset = offset;
 	info->iidi_value = value;
 	SYSCTL_ADD_PROC(device_get_sysctl_ctx(ctx->ifc_dev),
 	    SYSCTL_CHILDREN(device_get_sysctl_tree(ctx->ifc_dev)),
 	    OID_AUTO, name, CTLTYPE_INT|CTLFLAG_RW,
 	    info, 0, iflib_sysctl_int_delay, "I", description);
 }
 
 struct mtx *
 iflib_ctx_lock_get(if_ctx_t ctx)
 {
 
 	return (&ctx->ifc_mtx);
 }
 
 static int
 iflib_msix_init(if_ctx_t ctx)
 {
 	device_t dev = ctx->ifc_dev;
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
 	int vectors, queues, rx_queues, tx_queues, queuemsgs, msgs;
 	int iflib_num_tx_queues, iflib_num_rx_queues;
 	int err, admincnt, bar;
 
 	iflib_num_tx_queues = ctx->ifc_sysctl_ntxqs;
 	iflib_num_rx_queues = ctx->ifc_sysctl_nrxqs;
 	bar = ctx->ifc_softc_ctx.isc_msix_bar;
 	admincnt = sctx->isc_admin_intrcnt;
 	/* Override by tuneable */
 	if (enable_msix == 0)
 		goto msi;
 
 	/*
 	** When used in a virtualized environment
 	** PCI BUSMASTER capability may not be set
 	** so explicity set it here and rewrite
 	** the ENABLE in the MSIX control register
 	** at this point to cause the host to
 	** successfully initialize us.
 	*/
 	{
 		uint16_t pci_cmd_word;
 		int msix_ctrl, rid;
 
 		rid = 0;
 		pci_cmd_word = pci_read_config(dev, PCIR_COMMAND, 2);
 		pci_cmd_word |= PCIM_CMD_BUSMASTEREN;
 		pci_write_config(dev, PCIR_COMMAND, pci_cmd_word, 2);
 		pci_find_cap(dev, PCIY_MSIX, &rid);
 		rid += PCIR_MSIX_CTRL;
 		msix_ctrl = pci_read_config(dev, rid, 2);
 		msix_ctrl |= PCIM_MSIXCTRL_MSIX_ENABLE;
 		pci_write_config(dev, rid, msix_ctrl, 2);
 	}
 
 	/*
 	 * bar == -1 => "trust me I know what I'm doing"
 	 * https://www.youtube.com/watch?v=nnwWKkNau4I
 	 * Some drivers are for hardware that is so shoddily
 	 * documented that no one knows which bars are which
 	 * so the developer has to map all bars. This hack
 	 * allows shoddy garbage to use msix in this framework.
 	 */
 	if (bar != -1) {
 		ctx->ifc_msix_mem = bus_alloc_resource_any(dev,
 	            SYS_RES_MEMORY, &bar, RF_ACTIVE);
 		if (ctx->ifc_msix_mem == NULL) {
 			/* May not be enabled */
 			device_printf(dev, "Unable to map MSIX table \n");
 			goto msi;
 		}
 	}
 	/* First try MSI/X */
 	if ((msgs = pci_msix_count(dev)) == 0) { /* system has msix disabled */
 		device_printf(dev, "System has MSIX disabled \n");
 		bus_release_resource(dev, SYS_RES_MEMORY,
 		    bar, ctx->ifc_msix_mem);
 		ctx->ifc_msix_mem = NULL;
 		goto msi;
 	}
 #if IFLIB_DEBUG
 	/* use only 1 qset in debug mode */
 	queuemsgs = min(msgs - admincnt, 1);
 #else
 	queuemsgs = msgs - admincnt;
 #endif
 	if (bus_get_cpus(dev, INTR_CPUS, sizeof(ctx->ifc_cpus), &ctx->ifc_cpus) == 0) {
 #ifdef RSS
 		queues = imin(queuemsgs, rss_getnumbuckets());
 #else
 		queues = queuemsgs;
 #endif
 		queues = imin(CPU_COUNT(&ctx->ifc_cpus), queues);
 		device_printf(dev, "pxm cpus: %d queue msgs: %d admincnt: %d\n",
 					  CPU_COUNT(&ctx->ifc_cpus), queuemsgs, admincnt);
 	} else {
 		device_printf(dev, "Unable to fetch CPU list\n");
 		/* Figure out a reasonable auto config value */
 		queues = min(queuemsgs, mp_ncpus);
 	}
 #ifdef  RSS
 	/* If we're doing RSS, clamp at the number of RSS buckets */
 	if (queues > rss_getnumbuckets())
 		queues = rss_getnumbuckets();
 #endif
 	if (iflib_num_rx_queues > 0 && iflib_num_rx_queues < queues)
 		queues = rx_queues = iflib_num_rx_queues;
 	else
 		rx_queues = queues;
 	if (iflib_num_tx_queues > 0 && iflib_num_tx_queues < queues)
 		tx_queues = iflib_num_tx_queues;
 	else
 		tx_queues = queues;
 
 	device_printf(dev, "using %d rx queues %d tx queues \n", rx_queues, tx_queues);
 
 	vectors = queues + admincnt;
 	if ((err = pci_alloc_msix(dev, &vectors)) == 0) {
 		device_printf(dev,
 					  "Using MSIX interrupts with %d vectors\n", vectors);
 		scctx->isc_vectors = vectors;
 		scctx->isc_nrxqsets = rx_queues;
 		scctx->isc_ntxqsets = tx_queues;
 		scctx->isc_intr = IFLIB_INTR_MSIX;
 		return (vectors);
 	} else {
 		device_printf(dev, "failed to allocate %d msix vectors, err: %d - using MSI\n", vectors, err);
 	}
 msi:
 	vectors = pci_msi_count(dev);
 	scctx->isc_nrxqsets = 1;
 	scctx->isc_ntxqsets = 1;
 	scctx->isc_vectors = vectors;
 	if (vectors == 1 && pci_alloc_msi(dev, &vectors) == 0) {
 		device_printf(dev,"Using an MSI interrupt\n");
 		scctx->isc_intr = IFLIB_INTR_MSI;
 	} else {
 		device_printf(dev,"Using a Legacy interrupt\n");
 		scctx->isc_intr = IFLIB_INTR_LEGACY;
 	}
 
 	return (vectors);
 }
 
 char * ring_states[] = { "IDLE", "BUSY", "STALLED", "ABDICATED" };
 
 static int
 mp_ring_state_handler(SYSCTL_HANDLER_ARGS)
 {
 	int rc;
 	uint16_t *state = ((uint16_t *)oidp->oid_arg1);
 	struct sbuf *sb;
 	char *ring_state = "UNKNOWN";
 
 	/* XXX needed ? */
 	rc = sysctl_wire_old_buffer(req, 0);
 	MPASS(rc == 0);
 	if (rc != 0)
 		return (rc);
 	sb = sbuf_new_for_sysctl(NULL, NULL, 80, req);
 	MPASS(sb != NULL);
 	if (sb == NULL)
 		return (ENOMEM);
 	if (state[3] <= 3)
 		ring_state = ring_states[state[3]];
 
 	sbuf_printf(sb, "pidx_head: %04hd pidx_tail: %04hd cidx: %04hd state: %s",
 		    state[0], state[1], state[2], ring_state);
 	rc = sbuf_finish(sb);
 	sbuf_delete(sb);
         return(rc);
 }
 
 
 
 #define NAME_BUFLEN 32
 static void
 iflib_add_device_sysctl_pre(if_ctx_t ctx)
 {
         device_t dev = iflib_get_dev(ctx);
 	struct sysctl_oid_list *child, *oid_list;
 	struct sysctl_ctx_list *ctx_list;
 	struct sysctl_oid *node;
 
 	ctx_list = device_get_sysctl_ctx(dev);
 	child = SYSCTL_CHILDREN(device_get_sysctl_tree(dev));
 	ctx->ifc_sysctl_node = node = SYSCTL_ADD_NODE(ctx_list, child, OID_AUTO, "iflib",
 						      CTLFLAG_RD, NULL, "IFLIB fields");
 	oid_list = SYSCTL_CHILDREN(node);
 
 	SYSCTL_ADD_U16(ctx_list, oid_list, OID_AUTO, "override_ntxqs",
 		       CTLFLAG_RWTUN, &ctx->ifc_sysctl_ntxqs, 0,
 			"# of txqs to use, 0 => use default #");
 	SYSCTL_ADD_U16(ctx_list, oid_list, OID_AUTO, "override_nrxqs",
 		       CTLFLAG_RWTUN, &ctx->ifc_sysctl_ntxqs, 0,
 			"# of txqs to use, 0 => use default #");
 	SYSCTL_ADD_U16(ctx_list, oid_list, OID_AUTO, "override_ntxds",
 		       CTLFLAG_RWTUN, &ctx->ifc_sysctl_ntxds, 0,
 			"# of tx descriptors to use, 0 => use default #");
 	SYSCTL_ADD_U16(ctx_list, oid_list, OID_AUTO, "override_nrxds",
 		       CTLFLAG_RWTUN, &ctx->ifc_sysctl_nrxds, 0,
 			"# of rx descriptors to use, 0 => use default #");
 
 }
 
 static void
 iflib_add_device_sysctl_post(if_ctx_t ctx)
 {
 	if_shared_ctx_t sctx = ctx->ifc_sctx;
 	if_softc_ctx_t scctx = &ctx->ifc_softc_ctx;
         device_t dev = iflib_get_dev(ctx);
 	struct sysctl_oid_list *child;
 	struct sysctl_ctx_list *ctx_list;
 	iflib_fl_t fl;
 	iflib_txq_t txq;
 	iflib_rxq_t rxq;
 	int i, j;
 	char namebuf[NAME_BUFLEN];
 	char *qfmt;
 	struct sysctl_oid *queue_node, *fl_node, *node;
 	struct sysctl_oid_list *queue_list, *fl_list;
 	ctx_list = device_get_sysctl_ctx(dev);
 
 	node = ctx->ifc_sysctl_node;
 	child = SYSCTL_CHILDREN(node);
 
 	if (scctx->isc_ntxqsets > 100)
 		qfmt = "txq%03d";
 	else if (scctx->isc_ntxqsets > 10)
 		qfmt = "txq%02d";
 	else
 		qfmt = "txq%d";
 	for (i = 0, txq = ctx->ifc_txqs; i < scctx->isc_ntxqsets; i++, txq++) {
 		snprintf(namebuf, NAME_BUFLEN, qfmt, i);
 		queue_node = SYSCTL_ADD_NODE(ctx_list, child, OID_AUTO, namebuf,
 					     CTLFLAG_RD, NULL, "Queue Name");
 		queue_list = SYSCTL_CHILDREN(queue_node);
 #if MEMORY_LOGGING
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "txq_dequeued",
 				CTLFLAG_RD,
 				&txq->ift_dequeued, "total mbufs freed");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "txq_enqueued",
 				CTLFLAG_RD,
 				&txq->ift_enqueued, "total mbufs enqueued");
 #endif
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "mbuf_defrag",
 				   CTLFLAG_RD,
 				   &txq->ift_mbuf_defrag, "# of times m_defrag was called");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "m_pullups",
 				   CTLFLAG_RD,
 				   &txq->ift_pullups, "# of times m_pullup was called");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "mbuf_defrag_failed",
 				   CTLFLAG_RD,
 				   &txq->ift_mbuf_defrag_failed, "# of times m_defrag failed");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "no_desc_avail",
 				   CTLFLAG_RD,
 				   &txq->ift_mbuf_defrag_failed, "# of times no descriptors were available");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "tx_map_failed",
 				   CTLFLAG_RD,
 				   &txq->ift_map_failed, "# of times dma map failed");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "txd_encap_efbig",
 				   CTLFLAG_RD,
 				   &txq->ift_txd_encap_efbig, "# of times txd_encap returned EFBIG");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "no_tx_dma_setup",
 				   CTLFLAG_RD,
 				   &txq->ift_no_tx_dma_setup, "# of times map failed for other than EFBIG");
 		SYSCTL_ADD_U16(ctx_list, queue_list, OID_AUTO, "txq_pidx",
 				   CTLFLAG_RD,
 				   &txq->ift_pidx, 1, "Producer Index");
 		SYSCTL_ADD_U16(ctx_list, queue_list, OID_AUTO, "txq_cidx",
 				   CTLFLAG_RD,
 				   &txq->ift_cidx, 1, "Consumer Index");
 		SYSCTL_ADD_U16(ctx_list, queue_list, OID_AUTO, "txq_cidx_processed",
 				   CTLFLAG_RD,
 				   &txq->ift_cidx_processed, 1, "Consumer Index seen by credit update");
 		SYSCTL_ADD_U16(ctx_list, queue_list, OID_AUTO, "txq_in_use",
 				   CTLFLAG_RD,
 				   &txq->ift_in_use, 1, "descriptors in use");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "txq_processed",
 				   CTLFLAG_RD,
 				   &txq->ift_processed, "descriptors procesed for clean");
 		SYSCTL_ADD_QUAD(ctx_list, queue_list, OID_AUTO, "txq_cleaned",
 				   CTLFLAG_RD,
 				   &txq->ift_cleaned, "total cleaned");
 		SYSCTL_ADD_PROC(ctx_list, queue_list, OID_AUTO, "ring_state",
 				CTLTYPE_STRING | CTLFLAG_RD, __DEVOLATILE(uint64_t *, &txq->ift_br[0]->state),
 				0, mp_ring_state_handler, "A", "soft ring state");
 		SYSCTL_ADD_COUNTER_U64(ctx_list, queue_list, OID_AUTO, "r_enqueues",
 				       CTLFLAG_RD, &txq->ift_br[0]->enqueues,
 				       "# of enqueues to the mp_ring for this queue");
 		SYSCTL_ADD_COUNTER_U64(ctx_list, queue_list, OID_AUTO, "r_drops",
 				       CTLFLAG_RD, &txq->ift_br[0]->drops,
 				       "# of drops in the mp_ring for this queue");
 		SYSCTL_ADD_COUNTER_U64(ctx_list, queue_list, OID_AUTO, "r_starts",
 				       CTLFLAG_RD, &txq->ift_br[0]->starts,
 				       "# of normal consumer starts in the mp_ring for this queue");
 		SYSCTL_ADD_COUNTER_U64(ctx_list, queue_list, OID_AUTO, "r_stalls",
 				       CTLFLAG_RD, &txq->ift_br[0]->stalls,
 					       "# of consumer stalls in the mp_ring for this queue");
 		SYSCTL_ADD_COUNTER_U64(ctx_list, queue_list, OID_AUTO, "r_restarts",
 			       CTLFLAG_RD, &txq->ift_br[0]->restarts,
 				       "# of consumer restarts in the mp_ring for this queue");
 		SYSCTL_ADD_COUNTER_U64(ctx_list, queue_list, OID_AUTO, "r_abdications",
 				       CTLFLAG_RD, &txq->ift_br[0]->abdications,
 				       "# of consumer abdications in the mp_ring for this queue");
 
 	}
 
 	if (scctx->isc_nrxqsets > 100)
 		qfmt = "rxq%03d";
 	else if (scctx->isc_nrxqsets > 10)
 		qfmt = "rxq%02d";
 	else
 		qfmt = "rxq%d";
 	for (i = 0, rxq = ctx->ifc_rxqs; i < scctx->isc_nrxqsets; i++, rxq++) {
 		snprintf(namebuf, NAME_BUFLEN, qfmt, i);
 		queue_node = SYSCTL_ADD_NODE(ctx_list, child, OID_AUTO, namebuf,
 					     CTLFLAG_RD, NULL, "Queue Name");
 		queue_list = SYSCTL_CHILDREN(queue_node);
 		if (sctx->isc_flags & IFLIB_HAS_CQ) {
 			SYSCTL_ADD_U16(ctx_list, queue_list, OID_AUTO, "rxq_cq_pidx",
 				       CTLFLAG_RD,
 				       &rxq->ifr_cq_pidx, 1, "Producer Index");
 			SYSCTL_ADD_U16(ctx_list, queue_list, OID_AUTO, "rxq_cq_cidx",
 				       CTLFLAG_RD,
 				       &rxq->ifr_cq_cidx, 1, "Consumer Index");
 		}
 		for (j = 0, fl = rxq->ifr_fl; j < rxq->ifr_nfl; j++, fl++) {
 			snprintf(namebuf, NAME_BUFLEN, "rxq_fl%d", j);
 			fl_node = SYSCTL_ADD_NODE(ctx_list, queue_list, OID_AUTO, namebuf,
 						     CTLFLAG_RD, NULL, "freelist Name");
 			fl_list = SYSCTL_CHILDREN(fl_node);
 			SYSCTL_ADD_U16(ctx_list, fl_list, OID_AUTO, "pidx",
 				       CTLFLAG_RD,
 				       &fl->ifl_pidx, 1, "Producer Index");
 			SYSCTL_ADD_U16(ctx_list, fl_list, OID_AUTO, "cidx",
 				       CTLFLAG_RD,
 				       &fl->ifl_cidx, 1, "Consumer Index");
 			SYSCTL_ADD_U16(ctx_list, fl_list, OID_AUTO, "credits",
 				       CTLFLAG_RD,
 				       &fl->ifl_credits, 1, "credits available");
 #if MEMORY_LOGGING
 			SYSCTL_ADD_QUAD(ctx_list, fl_list, OID_AUTO, "fl_m_enqueued",
 					CTLFLAG_RD,
 					&fl->ifl_m_enqueued, "mbufs allocated");
 			SYSCTL_ADD_QUAD(ctx_list, fl_list, OID_AUTO, "fl_m_dequeued",
 					CTLFLAG_RD,
 					&fl->ifl_m_dequeued, "mbufs freed");
 			SYSCTL_ADD_QUAD(ctx_list, fl_list, OID_AUTO, "fl_cl_enqueued",
 					CTLFLAG_RD,
 					&fl->ifl_cl_enqueued, "clusters allocated");
 			SYSCTL_ADD_QUAD(ctx_list, fl_list, OID_AUTO, "fl_cl_dequeued",
 					CTLFLAG_RD,
 					&fl->ifl_cl_dequeued, "clusters freed");
 #endif
 
 		}
 	}
 
 }
Index: user/alc/PQ_LAUNDRY/sys/netinet/tcp_lro.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/netinet/tcp_lro.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/netinet/tcp_lro.c	(revision 303775)
@@ -1,941 +1,969 @@
 /*-
  * Copyright (c) 2007, Myricom Inc.
  * Copyright (c) 2008, Intel Corporation.
  * Copyright (c) 2012 The FreeBSD Foundation
  * Copyright (c) 2016 Mellanox Technologies.
  * All rights reserved.
  *
  * Portions of this software were developed by Bjoern Zeeb
  * under sponsorship from the FreeBSD Foundation.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_inet.h"
 #include "opt_inet6.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/malloc.h>
 #include <sys/mbuf.h>
 #include <sys/socket.h>
 
 #include <net/if.h>
 #include <net/if_var.h>
 #include <net/ethernet.h>
 #include <net/vnet.h>
 
 #include <netinet/in_systm.h>
 #include <netinet/in.h>
 #include <netinet/ip6.h>
 #include <netinet/ip.h>
 #include <netinet/ip_var.h>
 #include <netinet/tcp.h>
 #include <netinet/tcp_lro.h>
 
 #include <netinet6/ip6_var.h>
 
 #include <machine/in_cksum.h>
 
 static MALLOC_DEFINE(M_LRO, "LRO", "LRO control structures");
 
 #define	TCP_LRO_UPDATE_CSUM	1
 #ifndef	TCP_LRO_UPDATE_CSUM
 #define	TCP_LRO_INVALID_CSUM	0x0000
 #endif
 
 static void	tcp_lro_rx_done(struct lro_ctrl *lc);
 static int	tcp_lro_rx2(struct lro_ctrl *lc, struct mbuf *m,
 		    uint32_t csum, int use_hash);
 
 static __inline void
 tcp_lro_active_insert(struct lro_ctrl *lc, struct lro_head *bucket,
     struct lro_entry *le)
 {
 
 	LIST_INSERT_HEAD(&lc->lro_active, le, next);
 	LIST_INSERT_HEAD(bucket, le, hash_next);
 }
 
 static __inline void
 tcp_lro_active_remove(struct lro_entry *le)
 {
 
 	LIST_REMOVE(le, next);		/* active list */
 	LIST_REMOVE(le, hash_next);	/* hash bucket */
 }
 
 int
 tcp_lro_init(struct lro_ctrl *lc)
 {
 	return (tcp_lro_init_args(lc, NULL, TCP_LRO_ENTRIES, 0));
 }
 
 int
 tcp_lro_init_args(struct lro_ctrl *lc, struct ifnet *ifp,
     unsigned lro_entries, unsigned lro_mbufs)
 {
 	struct lro_entry *le;
 	size_t size;
 	unsigned i, elements;
 
 	lc->lro_bad_csum = 0;
 	lc->lro_queued = 0;
 	lc->lro_flushed = 0;
 	lc->lro_cnt = 0;
 	lc->lro_mbuf_count = 0;
 	lc->lro_mbuf_max = lro_mbufs;
 	lc->lro_cnt = lro_entries;
 	lc->lro_ackcnt_lim = TCP_LRO_ACKCNT_MAX;
 	lc->lro_length_lim = TCP_LRO_LENGTH_MAX;
 	lc->ifp = ifp;
 	LIST_INIT(&lc->lro_free);
 	LIST_INIT(&lc->lro_active);
 
 	/* create hash table to accelerate entry lookup */
 	if (lro_entries > lro_mbufs)
 		elements = lro_entries;
 	else
 		elements = lro_mbufs;
 	lc->lro_hash = phashinit_flags(elements, M_LRO, &lc->lro_hashsz,
 	    HASH_NOWAIT);
 	if (lc->lro_hash == NULL) {
 		memset(lc, 0, sizeof(*lc));
 		return (ENOMEM);
 	}
 
 	/* compute size to allocate */
 	size = (lro_mbufs * sizeof(struct lro_mbuf_sort)) +
 	    (lro_entries * sizeof(*le));
 	lc->lro_mbuf_data = (struct lro_mbuf_sort *)
 	    malloc(size, M_LRO, M_NOWAIT | M_ZERO);
 
 	/* check for out of memory */
 	if (lc->lro_mbuf_data == NULL) {
 		memset(lc, 0, sizeof(*lc));
 		return (ENOMEM);
 	}
 	/* compute offset for LRO entries */
 	le = (struct lro_entry *)
 	    (lc->lro_mbuf_data + lro_mbufs);
 
 	/* setup linked list */
 	for (i = 0; i != lro_entries; i++)
 		LIST_INSERT_HEAD(&lc->lro_free, le + i, next);
 
 	return (0);
 }
 
 void
 tcp_lro_free(struct lro_ctrl *lc)
 {
 	struct lro_entry *le;
 	unsigned x;
 
 	/* reset LRO free list */
 	LIST_INIT(&lc->lro_free);
 
 	/* free active mbufs, if any */
 	while ((le = LIST_FIRST(&lc->lro_active)) != NULL) {
 		tcp_lro_active_remove(le);
 		m_freem(le->m_head);
 	}
 
 	/* free hash table */
 	if (lc->lro_hash != NULL) {
 		free(lc->lro_hash, M_LRO);
 		lc->lro_hash = NULL;
 	}
 	lc->lro_hashsz = 0;
 
 	/* free mbuf array, if any */
 	for (x = 0; x != lc->lro_mbuf_count; x++)
 		m_freem(lc->lro_mbuf_data[x].mb);
 	lc->lro_mbuf_count = 0;
 	
 	/* free allocated memory, if any */
 	free(lc->lro_mbuf_data, M_LRO);
 	lc->lro_mbuf_data = NULL;
 }
 
 #ifdef TCP_LRO_UPDATE_CSUM
 static uint16_t
 tcp_lro_csum_th(struct tcphdr *th)
 {
 	uint32_t ch;
 	uint16_t *p, l;
 
 	ch = th->th_sum = 0x0000;
 	l = th->th_off;
 	p = (uint16_t *)th;
 	while (l > 0) {
 		ch += *p;
 		p++;
 		ch += *p;
 		p++;
 		l--;
 	}
 	while (ch > 0xffff)
 		ch = (ch >> 16) + (ch & 0xffff);
 
 	return (ch & 0xffff);
 }
 
 static uint16_t
 tcp_lro_rx_csum_fixup(struct lro_entry *le, void *l3hdr, struct tcphdr *th,
     uint16_t tcp_data_len, uint16_t csum)
 {
 	uint32_t c;
 	uint16_t cs;
 
 	c = csum;
 
 	/* Remove length from checksum. */
 	switch (le->eh_type) {
 #ifdef INET6
 	case ETHERTYPE_IPV6:
 	{
 		struct ip6_hdr *ip6;
 
 		ip6 = (struct ip6_hdr *)l3hdr;
 		if (le->append_cnt == 0)
 			cs = ip6->ip6_plen;
 		else {
 			uint32_t cx;
 
 			cx = ntohs(ip6->ip6_plen);
 			cs = in6_cksum_pseudo(ip6, cx, ip6->ip6_nxt, 0);
 		}
 		break;
 	}
 #endif
 #ifdef INET
 	case ETHERTYPE_IP:
 	{
 		struct ip *ip4;
 
 		ip4 = (struct ip *)l3hdr;
 		if (le->append_cnt == 0)
 			cs = ip4->ip_len;
 		else {
 			cs = in_addword(ntohs(ip4->ip_len) - sizeof(*ip4),
 			    IPPROTO_TCP);
 			cs = in_pseudo(ip4->ip_src.s_addr, ip4->ip_dst.s_addr,
 			    htons(cs));
 		}
 		break;
 	}
 #endif
 	default:
 		cs = 0;		/* Keep compiler happy. */
 	}
 
 	cs = ~cs;
 	c += cs;
 
 	/* Remove TCP header csum. */
 	cs = ~tcp_lro_csum_th(th);
 	c += cs;
 	while (c > 0xffff)
 		c = (c >> 16) + (c & 0xffff);
 
 	return (c & 0xffff);
 }
 #endif
 
 static void
 tcp_lro_rx_done(struct lro_ctrl *lc)
 {
 	struct lro_entry *le;
 
 	while ((le = LIST_FIRST(&lc->lro_active)) != NULL) {
 		tcp_lro_active_remove(le);
 		tcp_lro_flush(lc, le);
 	}
 }
 
 void
 tcp_lro_flush_inactive(struct lro_ctrl *lc, const struct timeval *timeout)
 {
 	struct lro_entry *le, *le_tmp;
 	struct timeval tv;
 
 	if (LIST_EMPTY(&lc->lro_active))
 		return;
 
 	getmicrotime(&tv);
 	timevalsub(&tv, timeout);
 	LIST_FOREACH_SAFE(le, &lc->lro_active, next, le_tmp) {
 		if (timevalcmp(&tv, &le->mtime, >=)) {
 			tcp_lro_active_remove(le);
 			tcp_lro_flush(lc, le);
 		}
 	}
 }
 
 void
 tcp_lro_flush(struct lro_ctrl *lc, struct lro_entry *le)
 {
 
 	if (le->append_cnt > 0) {
 		struct tcphdr *th;
 		uint16_t p_len;
 
 		p_len = htons(le->p_len);
 		switch (le->eh_type) {
 #ifdef INET6
 		case ETHERTYPE_IPV6:
 		{
 			struct ip6_hdr *ip6;
 
 			ip6 = le->le_ip6;
 			ip6->ip6_plen = p_len;
 			th = (struct tcphdr *)(ip6 + 1);
 			le->m_head->m_pkthdr.csum_flags = CSUM_DATA_VALID |
 			    CSUM_PSEUDO_HDR;
 			le->p_len += ETHER_HDR_LEN + sizeof(*ip6);
 			break;
 		}
 #endif
 #ifdef INET
 		case ETHERTYPE_IP:
 		{
 			struct ip *ip4;
 #ifdef TCP_LRO_UPDATE_CSUM
 			uint32_t cl;
 			uint16_t c;
 #endif
 
 			ip4 = le->le_ip4;
 #ifdef TCP_LRO_UPDATE_CSUM
 			/* Fix IP header checksum for new length. */
 			c = ~ip4->ip_sum;
 			cl = c;
 			c = ~ip4->ip_len;
 			cl += c + p_len;
 			while (cl > 0xffff)
 				cl = (cl >> 16) + (cl & 0xffff);
 			c = cl;
 			ip4->ip_sum = ~c;
 #else
 			ip4->ip_sum = TCP_LRO_INVALID_CSUM;
 #endif
 			ip4->ip_len = p_len;
 			th = (struct tcphdr *)(ip4 + 1);
 			le->m_head->m_pkthdr.csum_flags = CSUM_DATA_VALID |
 			    CSUM_PSEUDO_HDR | CSUM_IP_CHECKED | CSUM_IP_VALID;
 			le->p_len += ETHER_HDR_LEN;
 			break;
 		}
 #endif
 		default:
 			th = NULL;	/* Keep compiler happy. */
 		}
 		le->m_head->m_pkthdr.csum_data = 0xffff;
 		le->m_head->m_pkthdr.len = le->p_len;
 
 		/* Incorporate the latest ACK into the TCP header. */
 		th->th_ack = le->ack_seq;
 		th->th_win = le->window;
 		/* Incorporate latest timestamp into the TCP header. */
 		if (le->timestamp != 0) {
 			uint32_t *ts_ptr;
 
 			ts_ptr = (uint32_t *)(th + 1);
 			ts_ptr[1] = htonl(le->tsval);
 			ts_ptr[2] = le->tsecr;
 		}
 #ifdef TCP_LRO_UPDATE_CSUM
 		/* Update the TCP header checksum. */
 		le->ulp_csum += p_len;
 		le->ulp_csum += tcp_lro_csum_th(th);
 		while (le->ulp_csum > 0xffff)
 			le->ulp_csum = (le->ulp_csum >> 16) +
 			    (le->ulp_csum & 0xffff);
 		th->th_sum = (le->ulp_csum & 0xffff);
 		th->th_sum = ~th->th_sum;
 #else
 		th->th_sum = TCP_LRO_INVALID_CSUM;
 #endif
 	}
 
 	(*lc->ifp->if_input)(lc->ifp, le->m_head);
 	lc->lro_queued += le->append_cnt + 1;
 	lc->lro_flushed++;
 	bzero(le, sizeof(*le));
 	LIST_INSERT_HEAD(&lc->lro_free, le, next);
 }
 
 #ifdef HAVE_INLINE_FLSLL
 #define	tcp_lro_msb_64(x) (1ULL << (flsll(x) - 1))
 #else
 static inline uint64_t
 tcp_lro_msb_64(uint64_t x)
 {
 	x |= (x >> 1);
 	x |= (x >> 2);
 	x |= (x >> 4);
 	x |= (x >> 8);
 	x |= (x >> 16);
 	x |= (x >> 32);
 	return (x & ~(x >> 1));
 }
 #endif
 
 /*
  * The tcp_lro_sort() routine is comparable to qsort(), except it has
  * a worst case complexity limit of O(MIN(N,64)*N), where N is the
  * number of elements to sort and 64 is the number of sequence bits
  * available. The algorithm is bit-slicing the 64-bit sequence number,
  * sorting one bit at a time from the most significant bit until the
  * least significant one, skipping the constant bits. This is
  * typically called a radix sort.
  */
 static void
 tcp_lro_sort(struct lro_mbuf_sort *parray, uint32_t size)
 {
 	struct lro_mbuf_sort temp;
 	uint64_t ones;
 	uint64_t zeros;
 	uint32_t x;
 	uint32_t y;
 
 repeat:
 	/* for small arrays insertion sort is faster */
 	if (size <= 12) {
 		for (x = 1; x < size; x++) {
 			temp = parray[x];
 			for (y = x; y > 0 && temp.seq < parray[y - 1].seq; y--)
 				parray[y] = parray[y - 1];
 			parray[y] = temp;
 		}
 		return;
 	}
 
 	/* compute sequence bits which are constant */
 	ones = 0;
 	zeros = 0;
 	for (x = 0; x != size; x++) {
 		ones |= parray[x].seq;
 		zeros |= ~parray[x].seq;
 	}
 
 	/* compute bits which are not constant into "ones" */
 	ones &= zeros;
 	if (ones == 0)
 		return;
 
 	/* pick the most significant bit which is not constant */
 	ones = tcp_lro_msb_64(ones);
 
 	/*
 	 * Move entries having cleared sequence bits to the beginning
 	 * of the array:
 	 */
 	for (x = y = 0; y != size; y++) {
 		/* skip set bits */
 		if (parray[y].seq & ones)
 			continue;
 		/* swap entries */
 		temp = parray[x];
 		parray[x] = parray[y];
 		parray[y] = temp;
 		x++;
 	}
 
 	KASSERT(x != 0 && x != size, ("Memory is corrupted\n"));
 
 	/* sort zeros */
 	tcp_lro_sort(parray, x);
 
 	/* sort ones */
 	parray += x;
 	size -= x;
 	goto repeat;
 }
 
 void
 tcp_lro_flush_all(struct lro_ctrl *lc)
 {
 	uint64_t seq;
 	uint64_t nseq;
 	unsigned x;
 
 	/* check if no mbufs to flush */
 	if (lc->lro_mbuf_count == 0)
 		goto done;
 
 	/* sort all mbufs according to stream */
 	tcp_lro_sort(lc->lro_mbuf_data, lc->lro_mbuf_count);
 
 	/* input data into LRO engine, stream by stream */
 	seq = 0;
 	for (x = 0; x != lc->lro_mbuf_count; x++) {
 		struct mbuf *mb;
 
 		/* get mbuf */
 		mb = lc->lro_mbuf_data[x].mb;
 
 		/* get sequence number, masking away the packet index */
 		nseq = lc->lro_mbuf_data[x].seq & (-1ULL << 24);
 
 		/* check for new stream */
 		if (seq != nseq) {
 			seq = nseq;
 
 			/* flush active streams */
 			tcp_lro_rx_done(lc);
 		}
 
 		/* add packet to LRO engine */
 		if (tcp_lro_rx2(lc, mb, 0, 0) != 0) {
 			/* input packet to network layer */
 			(*lc->ifp->if_input)(lc->ifp, mb);
 			lc->lro_queued++;
 			lc->lro_flushed++;
 		}
 	}
 done:
 	/* flush active streams */
 	tcp_lro_rx_done(lc);
 
 	lc->lro_mbuf_count = 0;
 }
 
 #ifdef INET6
 static int
 tcp_lro_rx_ipv6(struct lro_ctrl *lc, struct mbuf *m, struct ip6_hdr *ip6,
     struct tcphdr **th)
 {
 
 	/* XXX-BZ we should check the flow-label. */
 
 	/* XXX-BZ We do not yet support ext. hdrs. */
 	if (ip6->ip6_nxt != IPPROTO_TCP)
 		return (TCP_LRO_NOT_SUPPORTED);
 
 	/* Find the TCP header. */
 	*th = (struct tcphdr *)(ip6 + 1);
 
 	return (0);
 }
 #endif
 
 #ifdef INET
 static int
 tcp_lro_rx_ipv4(struct lro_ctrl *lc, struct mbuf *m, struct ip *ip4,
     struct tcphdr **th)
 {
 	int csum_flags;
 	uint16_t csum;
 
 	if (ip4->ip_p != IPPROTO_TCP)
 		return (TCP_LRO_NOT_SUPPORTED);
 
 	/* Ensure there are no options. */
 	if ((ip4->ip_hl << 2) != sizeof (*ip4))
 		return (TCP_LRO_CANNOT);
 
 	/* .. and the packet is not fragmented. */
 	if (ip4->ip_off & htons(IP_MF|IP_OFFMASK))
 		return (TCP_LRO_CANNOT);
 
 	/* Legacy IP has a header checksum that needs to be correct. */
 	csum_flags = m->m_pkthdr.csum_flags;
 	if (csum_flags & CSUM_IP_CHECKED) {
 		if (__predict_false((csum_flags & CSUM_IP_VALID) == 0)) {
 			lc->lro_bad_csum++;
 			return (TCP_LRO_CANNOT);
 		}
 	} else {
 		csum = in_cksum_hdr(ip4);
 		if (__predict_false((csum) != 0)) {
 			lc->lro_bad_csum++;
 			return (TCP_LRO_CANNOT);
 		}
 	}
 
 	/* Find the TCP header (we assured there are no IP options). */
 	*th = (struct tcphdr *)(ip4 + 1);
 
 	return (0);
 }
 #endif
 
 static int
 tcp_lro_rx2(struct lro_ctrl *lc, struct mbuf *m, uint32_t csum, int use_hash)
 {
 	struct lro_entry *le;
 	struct ether_header *eh;
 #ifdef INET6
 	struct ip6_hdr *ip6 = NULL;	/* Keep compiler happy. */
 #endif
 #ifdef INET
 	struct ip *ip4 = NULL;		/* Keep compiler happy. */
 #endif
 	struct tcphdr *th;
 	void *l3hdr = NULL;		/* Keep compiler happy. */
 	uint32_t *ts_ptr;
 	tcp_seq seq;
 	int error, ip_len, l;
 	uint16_t eh_type, tcp_data_len;
 	struct lro_head *bucket;
+	int force_flush = 0;
 
 	/* We expect a contiguous header [eh, ip, tcp]. */
 
 	eh = mtod(m, struct ether_header *);
 	eh_type = ntohs(eh->ether_type);
 	switch (eh_type) {
 #ifdef INET6
 	case ETHERTYPE_IPV6:
 	{
 		CURVNET_SET(lc->ifp->if_vnet);
 		if (V_ip6_forwarding != 0) {
 			/* XXX-BZ stats but changing lro_ctrl is a problem. */
 			CURVNET_RESTORE();
 			return (TCP_LRO_CANNOT);
 		}
 		CURVNET_RESTORE();
 		l3hdr = ip6 = (struct ip6_hdr *)(eh + 1);
 		error = tcp_lro_rx_ipv6(lc, m, ip6, &th);
 		if (error != 0)
 			return (error);
 		tcp_data_len = ntohs(ip6->ip6_plen);
 		ip_len = sizeof(*ip6) + tcp_data_len;
 		break;
 	}
 #endif
 #ifdef INET
 	case ETHERTYPE_IP:
 	{
 		CURVNET_SET(lc->ifp->if_vnet);
 		if (V_ipforwarding != 0) {
 			/* XXX-BZ stats but changing lro_ctrl is a problem. */
 			CURVNET_RESTORE();
 			return (TCP_LRO_CANNOT);
 		}
 		CURVNET_RESTORE();
 		l3hdr = ip4 = (struct ip *)(eh + 1);
 		error = tcp_lro_rx_ipv4(lc, m, ip4, &th);
 		if (error != 0)
 			return (error);
 		ip_len = ntohs(ip4->ip_len);
 		tcp_data_len = ip_len - sizeof(*ip4);
 		break;
 	}
 #endif
 	/* XXX-BZ what happens in case of VLAN(s)? */
 	default:
 		return (TCP_LRO_NOT_SUPPORTED);
 	}
 
 	/*
 	 * If the frame is padded beyond the end of the IP packet, then we must
 	 * trim the extra bytes off.
 	 */
 	l = m->m_pkthdr.len - (ETHER_HDR_LEN + ip_len);
 	if (l != 0) {
 		if (l < 0)
 			/* Truncated packet. */
 			return (TCP_LRO_CANNOT);
 
 		m_adj(m, -l);
 	}
 
 	/*
 	 * Check TCP header constraints.
 	 */
 	/* Ensure no bits set besides ACK or PSH. */
-	if ((th->th_flags & ~(TH_ACK | TH_PUSH)) != 0)
-		return (TCP_LRO_CANNOT);
+	if ((th->th_flags & ~(TH_ACK | TH_PUSH)) != 0) {
+		if (th->th_flags & TH_SYN)
+			return (TCP_LRO_CANNOT);
+		/*
+		 * Make sure that previously seen segements/ACKs are delivered
+		 * before this segement, e.g. FIN.
+		 */
+		force_flush = 1;
+	}
 
 	/* XXX-BZ We lose a ACK|PUSH flag concatenating multiple segments. */
 	/* XXX-BZ Ideally we'd flush on PUSH? */
 
 	/*
 	 * Check for timestamps.
 	 * Since the only option we handle are timestamps, we only have to
 	 * handle the simple case of aligned timestamps.
 	 */
 	l = (th->th_off << 2);
 	tcp_data_len -= l;
 	l -= sizeof(*th);
 	ts_ptr = (uint32_t *)(th + 1);
 	if (l != 0 && (__predict_false(l != TCPOLEN_TSTAMP_APPA) ||
 	    (*ts_ptr != ntohl(TCPOPT_NOP<<24|TCPOPT_NOP<<16|
-	    TCPOPT_TIMESTAMP<<8|TCPOLEN_TIMESTAMP))))
-		return (TCP_LRO_CANNOT);
+	    TCPOPT_TIMESTAMP<<8|TCPOLEN_TIMESTAMP)))) {
+		/*
+		 * Make sure that previously seen segements/ACKs are delivered
+		 * before this segement.
+		 */
+		force_flush = 1;
+	}
 
 	/* If the driver did not pass in the checksum, set it now. */
 	if (csum == 0x0000)
 		csum = th->th_sum;
 
 	seq = ntohl(th->th_seq);
 
 	if (!use_hash) {
 		bucket = &lc->lro_hash[0];
 	} else if (M_HASHTYPE_ISHASH(m)) {
 		bucket = &lc->lro_hash[m->m_pkthdr.flowid % lc->lro_hashsz];
 	} else {
 		uint32_t hash;
 
 		switch (eh_type) {
 #ifdef INET
 		case ETHERTYPE_IP:
 			hash = ip4->ip_src.s_addr + ip4->ip_dst.s_addr;
 			break;
 #endif
 #ifdef INET6
 		case ETHERTYPE_IPV6:
 			hash = ip6->ip6_src.s6_addr32[0] +
 			    ip6->ip6_dst.s6_addr32[0];
 			hash += ip6->ip6_src.s6_addr32[1] +
 			    ip6->ip6_dst.s6_addr32[1];
 			hash += ip6->ip6_src.s6_addr32[2] +
 			    ip6->ip6_dst.s6_addr32[2];
 			hash += ip6->ip6_src.s6_addr32[3] +
 			    ip6->ip6_dst.s6_addr32[3];
 			break;
 #endif
 		default:
 			hash = 0;
 			break;
 		}
 		hash += th->th_sport + th->th_dport;
 		bucket = &lc->lro_hash[hash % lc->lro_hashsz];
 	}
 
 	/* Try to find a matching previous segment. */
 	LIST_FOREACH(le, bucket, hash_next) {
 		if (le->eh_type != eh_type)
 			continue;
 		if (le->source_port != th->th_sport ||
 		    le->dest_port != th->th_dport)
 			continue;
 		switch (eh_type) {
 #ifdef INET6
 		case ETHERTYPE_IPV6:
 			if (bcmp(&le->source_ip6, &ip6->ip6_src,
 			    sizeof(struct in6_addr)) != 0 ||
 			    bcmp(&le->dest_ip6, &ip6->ip6_dst,
 			    sizeof(struct in6_addr)) != 0)
 				continue;
 			break;
 #endif
 #ifdef INET
 		case ETHERTYPE_IP:
 			if (le->source_ip4 != ip4->ip_src.s_addr ||
 			    le->dest_ip4 != ip4->ip_dst.s_addr)
 				continue;
 			break;
 #endif
 		}
 
+		if (force_flush) {
+			/* Timestamps mismatch; this is a FIN, etc */
+			tcp_lro_active_remove(le);
+			tcp_lro_flush(lc, le);
+			return (TCP_LRO_CANNOT);
+		}
+
 		/* Flush now if appending will result in overflow. */
 		if (le->p_len > (lc->lro_length_lim - tcp_data_len)) {
 			tcp_lro_active_remove(le);
 			tcp_lro_flush(lc, le);
 			break;
 		}
 
 		/* Try to append the new segment. */
 		if (__predict_false(seq != le->next_seq ||
 		    (tcp_data_len == 0 && le->ack_seq == th->th_ack))) {
 			/* Out of order packet or duplicate ACK. */
 			tcp_lro_active_remove(le);
 			tcp_lro_flush(lc, le);
 			return (TCP_LRO_CANNOT);
 		}
 
 		if (l != 0) {
 			uint32_t tsval = ntohl(*(ts_ptr + 1));
 			/* Make sure timestamp values are increasing. */
 			/* XXX-BZ flip and use TSTMP_GEQ macro for this? */
 			if (__predict_false(le->tsval > tsval ||
 			    *(ts_ptr + 2) == 0))
 				return (TCP_LRO_CANNOT);
 			le->tsval = tsval;
 			le->tsecr = *(ts_ptr + 2);
 		}
 
 		le->next_seq += tcp_data_len;
 		le->ack_seq = th->th_ack;
 		le->window = th->th_win;
 		le->append_cnt++;
 
 #ifdef TCP_LRO_UPDATE_CSUM
 		le->ulp_csum += tcp_lro_rx_csum_fixup(le, l3hdr, th,
 		    tcp_data_len, ~csum);
 #endif
 
 		if (tcp_data_len == 0) {
 			m_freem(m);
 			/*
 			 * Flush this LRO entry, if this ACK should not
 			 * be further delayed.
 			 */
 			if (le->append_cnt >= lc->lro_ackcnt_lim) {
 				tcp_lro_active_remove(le);
 				tcp_lro_flush(lc, le);
 			}
 			return (0);
 		}
 
 		le->p_len += tcp_data_len;
 
 		/*
 		 * Adjust the mbuf so that m_data points to the first byte of
 		 * the ULP payload.  Adjust the mbuf to avoid complications and
 		 * append new segment to existing mbuf chain.
 		 */
 		m_adj(m, m->m_pkthdr.len - tcp_data_len);
 		m_demote_pkthdr(m);
 
 		le->m_tail->m_next = m;
 		le->m_tail = m_last(m);
 
 		/*
 		 * If a possible next full length packet would cause an
 		 * overflow, pro-actively flush now.
 		 */
 		if (le->p_len > (lc->lro_length_lim - lc->ifp->if_mtu)) {
 			tcp_lro_active_remove(le);
 			tcp_lro_flush(lc, le);
 		} else
 			getmicrotime(&le->mtime);
 
 		return (0);
+	}
+
+	if (force_flush) {
+		/*
+		 * Nothing to flush, but this segment can not be further
+		 * aggregated/delayed.
+		 */
+		return (TCP_LRO_CANNOT);
 	}
 
 	/* Try to find an empty slot. */
 	if (LIST_EMPTY(&lc->lro_free))
 		return (TCP_LRO_NO_ENTRIES);
 
 	/* Start a new segment chain. */
 	le = LIST_FIRST(&lc->lro_free);
 	LIST_REMOVE(le, next);
 	tcp_lro_active_insert(lc, bucket, le);
 	getmicrotime(&le->mtime);
 
 	/* Start filling in details. */
 	switch (eh_type) {
 #ifdef INET6
 	case ETHERTYPE_IPV6:
 		le->le_ip6 = ip6;
 		le->source_ip6 = ip6->ip6_src;
 		le->dest_ip6 = ip6->ip6_dst;
 		le->eh_type = eh_type;
 		le->p_len = m->m_pkthdr.len - ETHER_HDR_LEN - sizeof(*ip6);
 		break;
 #endif
 #ifdef INET
 	case ETHERTYPE_IP:
 		le->le_ip4 = ip4;
 		le->source_ip4 = ip4->ip_src.s_addr;
 		le->dest_ip4 = ip4->ip_dst.s_addr;
 		le->eh_type = eh_type;
 		le->p_len = m->m_pkthdr.len - ETHER_HDR_LEN;
 		break;
 #endif
 	}
 	le->source_port = th->th_sport;
 	le->dest_port = th->th_dport;
 
 	le->next_seq = seq + tcp_data_len;
 	le->ack_seq = th->th_ack;
 	le->window = th->th_win;
 	if (l != 0) {
 		le->timestamp = 1;
 		le->tsval = ntohl(*(ts_ptr + 1));
 		le->tsecr = *(ts_ptr + 2);
 	}
 
 #ifdef TCP_LRO_UPDATE_CSUM
 	/*
 	 * Do not touch the csum of the first packet.  However save the
 	 * "adjusted" checksum of just the source and destination addresses,
 	 * the next header and the TCP payload.  The length and TCP header
 	 * parts may change, so we remove those from the saved checksum and
 	 * re-add with final values on tcp_lro_flush() if needed.
 	 */
 	KASSERT(le->ulp_csum == 0, ("%s: le=%p le->ulp_csum=0x%04x\n",
 	    __func__, le, le->ulp_csum));
 
 	le->ulp_csum = tcp_lro_rx_csum_fixup(le, l3hdr, th, tcp_data_len,
 	    ~csum);
 	th->th_sum = csum;	/* Restore checksum on first packet. */
 #endif
 
 	le->m_head = m;
 	le->m_tail = m_last(m);
 
 	return (0);
 }
 
 int
 tcp_lro_rx(struct lro_ctrl *lc, struct mbuf *m, uint32_t csum)
 {
 
 	return tcp_lro_rx2(lc, m, csum, 1);
 }
 
 void
 tcp_lro_queue_mbuf(struct lro_ctrl *lc, struct mbuf *mb)
 {
 	/* sanity checks */
 	if (__predict_false(lc->ifp == NULL || lc->lro_mbuf_data == NULL ||
 	    lc->lro_mbuf_max == 0)) {
 		/* packet drop */
 		m_freem(mb);
 		return;
 	}
 
 	/* check if packet is not LRO capable */
 	if (__predict_false(mb->m_pkthdr.csum_flags == 0 ||
 	    (lc->ifp->if_capenable & IFCAP_LRO) == 0)) {
 		lc->lro_flushed++;
 		lc->lro_queued++;
 
 		/* input packet to network layer */
 		(*lc->ifp->if_input) (lc->ifp, mb);
 		return;
 	}
 
 	/* check if array is full */
 	if (__predict_false(lc->lro_mbuf_count == lc->lro_mbuf_max))
 		tcp_lro_flush_all(lc);
 
 	/* create sequence number */
 	lc->lro_mbuf_data[lc->lro_mbuf_count].seq =
 	    (((uint64_t)M_HASHTYPE_GET(mb)) << 56) |
 	    (((uint64_t)mb->m_pkthdr.flowid) << 24) |
 	    ((uint64_t)lc->lro_mbuf_count);
 
 	/* enter mbuf */
 	lc->lro_mbuf_data[lc->lro_mbuf_count++].mb = mb;
 }
 
 /* end */
Index: user/alc/PQ_LAUNDRY/sys/sys/buf.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/sys/buf.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/sys/buf.h	(revision 303775)
@@ -1,552 +1,546 @@
 /*-
  * Copyright (c) 1982, 1986, 1989, 1993
  *	The Regents of the University of California.  All rights reserved.
  * (c) UNIX System Laboratories, Inc.
  * All or some portions of this file are derived from material licensed
  * to the University of California by American Telephone and Telegraph
  * Co. or Unix System Laboratories, Inc. and are reproduced herein with
  * the permission of UNIX System Laboratories, Inc.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 4. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	@(#)buf.h	8.9 (Berkeley) 3/30/95
  * $FreeBSD$
  */
 
 #ifndef _SYS_BUF_H_
 #define	_SYS_BUF_H_
 
 #include <sys/bufobj.h>
 #include <sys/queue.h>
 #include <sys/lock.h>
 #include <sys/lockmgr.h>
 
 struct bio;
 struct buf;
 struct bufobj;
 struct mount;
 struct vnode;
 struct uio;
 
 /*
  * To avoid including <ufs/ffs/softdep.h> 
  */   
 LIST_HEAD(workhead, worklist);
 /*
  * These are currently used only by the soft dependency code, hence
  * are stored once in a global variable. If other subsystems wanted
  * to use these hooks, a pointer to a set of bio_ops could be added
  * to each buffer.
  */
 extern struct bio_ops {
 	void	(*io_start)(struct buf *);
 	void	(*io_complete)(struct buf *);
 	void	(*io_deallocate)(struct buf *);
 	int	(*io_countdeps)(struct buf *, int);
 } bioops;
 
 struct vm_object;
 
 typedef unsigned char b_xflags_t;
 
 /*
  * The buffer header describes an I/O operation in the kernel.
  *
  * NOTES:
  *	b_bufsize, b_bcount.  b_bufsize is the allocation size of the
  *	buffer, either DEV_BSIZE or PAGE_SIZE aligned.  b_bcount is the
  *	originally requested buffer size and can serve as a bounds check
  *	against EOF.  For most, but not all uses, b_bcount == b_bufsize.
  *
  *	b_dirtyoff, b_dirtyend.  Buffers support piecemeal, unaligned
  *	ranges of dirty data that need to be written to backing store.
  *	The range is typically clipped at b_bcount ( not b_bufsize ).
  *
  *	b_resid.  Number of bytes remaining in I/O.  After an I/O operation
  *	completes, b_resid is usually 0 indicating 100% success.
  *
  *	All fields are protected by the buffer lock except those marked:
  *		V - Protected by owning bufobj lock
  *		Q - Protected by the buf queue lock
  *		D - Protected by an dependency implementation specific lock
  */
 struct buf {
 	struct bufobj	*b_bufobj;
 	long		b_bcount;
 	void		*b_caller1;
 	caddr_t		b_data;
 	int		b_error;
 	uint16_t	b_iocmd;	/* BIO_* bio_cmd from bio.h */
 	uint16_t	b_ioflags;	/* BIO_* bio_flags from bio.h */
 	off_t		b_iooffset;
 	long		b_resid;
 	void	(*b_iodone)(struct buf *);
 	daddr_t b_blkno;		/* Underlying physical block number. */
 	off_t	b_offset;		/* Offset into file. */
 	TAILQ_ENTRY(buf) b_bobufs;	/* (V) Buffer's associated vnode. */
 	uint32_t	b_vflags;	/* (V) BV_* flags */
 	unsigned short b_qindex;	/* (Q) buffer queue index */
 	uint32_t	b_flags;	/* B_* flags. */
 	b_xflags_t b_xflags;		/* extra flags */
 	struct lock b_lock;		/* Buffer lock */
 	long	b_bufsize;		/* Allocated buffer size. */
 	int	b_runningbufspace;	/* when I/O is running, pipelining */
 	int	b_kvasize;		/* size of kva for buffer */
 	int	b_dirtyoff;		/* Offset in buffer of dirty region. */
 	int	b_dirtyend;		/* Offset of end of dirty region. */
 	caddr_t	b_kvabase;		/* base kva for buffer */
 	daddr_t b_lblkno;		/* Logical block number. */
 	struct	vnode *b_vp;		/* Device vnode. */
 	struct	ucred *b_rcred;		/* Read credentials reference. */
 	struct	ucred *b_wcred;		/* Write credentials reference. */
 	union {
 		TAILQ_ENTRY(buf) b_freelist; /* (Q) */
 		struct {
 			void	(*b_pgiodone)(void *, vm_page_t *, int, int);
 			int	b_pgbefore;
 			int	b_pgafter;
 		};
 	};
 	union	cluster_info {
 		TAILQ_HEAD(cluster_list_head, buf) cluster_head;
 		TAILQ_ENTRY(buf) cluster_entry;
 	} b_cluster;
 	struct	vm_page *b_pages[btoc(MAXPHYS)];
 	int		b_npages;
 	struct	workhead b_dep;		/* (D) List of filesystem dependencies. */
 	void	*b_fsprivate1;
 	void	*b_fsprivate2;
 	void	*b_fsprivate3;
 	int	b_pin_count;
 };
 
 #define b_object	b_bufobj->bo_object
 
 /*
  * These flags are kept in b_flags.
  *
  * Notes:
  *
  *	B_ASYNC		VOP calls on bp's are usually async whether or not
  *			B_ASYNC is set, but some subsystems, such as NFS, like 
  *			to know what is best for the caller so they can
  *			optimize the I/O.
  *
  *	B_PAGING	Indicates that bp is being used by the paging system or
  *			some paging system and that the bp is not linked into
  *			the b_vp's clean/dirty linked lists or ref counts.
  *			Buffer vp reassignments are illegal in this case.
  *
  *	B_CACHE		This may only be set if the buffer is entirely valid.
  *			The situation where B_DELWRI is set and B_CACHE is
  *			clear MUST be committed to disk by getblk() so 
  *			B_DELWRI can also be cleared.  See the comments for
  *			getblk() in kern/vfs_bio.c.  If B_CACHE is clear,
  *			the caller is expected to clear BIO_ERROR and B_INVAL,
  *			set BIO_READ, and initiate an I/O.
  *
  *			The 'entire buffer' is defined to be the range from
  *			0 through b_bcount.
  *
  *	B_MALLOC	Request that the buffer be allocated from the malloc
  *			pool, DEV_BSIZE aligned instead of PAGE_SIZE aligned.
  *
  *	B_CLUSTEROK	This flag is typically set for B_DELWRI buffers
  *			by filesystems that allow clustering when the buffer
  *			is fully dirty and indicates that it may be clustered
  *			with other adjacent dirty buffers.  Note the clustering
  *			may not be used with the stage 1 data write under NFS
  *			but may be used for the commit rpc portion.
  *
  *	B_VMIO		Indicates that the buffer is tied into an VM object.
  *			The buffer's data is always PAGE_SIZE aligned even
  *			if b_bufsize and b_bcount are not.  ( b_bufsize is 
  *			always at least DEV_BSIZE aligned, though ).
  *
  *	B_DIRECT	Hint that we should attempt to completely free
  *			the pages underlying the buffer.  B_DIRECT is
  *			sticky until the buffer is released and typically
  *			only has an effect when B_RELBUF is also set.
  *
  */
 
 #define	B_AGE		0x00000001	/* Move to age queue when I/O done. */
 #define	B_NEEDCOMMIT	0x00000002	/* Append-write in progress. */
 #define	B_ASYNC		0x00000004	/* Start I/O, do not wait. */
 #define	B_DIRECT	0x00000008	/* direct I/O flag (pls free vmio) */
 #define	B_DEFERRED	0x00000010	/* Skipped over for cleaning */
 #define	B_CACHE		0x00000020	/* Bread found us in the cache. */
 #define	B_VALIDSUSPWRT	0x00000040	/* Valid write during suspension. */
 #define	B_DELWRI	0x00000080	/* Delay I/O until buffer reused. */
 #define	B_00000100	0x00000100	/* Available flag. */
 #define	B_DONE		0x00000200	/* I/O completed. */
 #define	B_EINTR		0x00000400	/* I/O was interrupted */
 #define	B_NOREUSE	0x00000800	/* Contents not reused once released. */
 #define	B_00001000	0x00001000	/* Available flag. */
 #define	B_INVAL		0x00002000	/* Does not contain valid info. */
 #define	B_BARRIER	0x00004000	/* Write this and all preceding first. */
 #define	B_NOCACHE	0x00008000	/* Do not cache block after use. */
 #define	B_MALLOC	0x00010000	/* malloced b_data */
 #define	B_CLUSTEROK	0x00020000	/* Pagein op, so swap() can count it. */
 #define	B_00040000	0x00040000	/* Available flag. */
 #define	B_00080000	0x00080000	/* Available flag. */
 #define	B_00100000	0x00100000	/* Available flag. */
 #define	B_00200000	0x00200000	/* Available flag. */
 #define	B_RELBUF	0x00400000	/* Release VMIO buffer. */
 #define	B_FS_FLAG1	0x00800000	/* Available flag for FS use. */
 #define	B_NOCOPY	0x01000000	/* Don't copy-on-write this buf. */
 #define	B_INFREECNT	0x02000000	/* buf is counted in numfreebufs */
 #define	B_PAGING	0x04000000	/* volatile paging I/O -- bypass VMIO */
 #define B_MANAGED	0x08000000	/* Managed by FS. */
 #define B_RAM		0x10000000	/* Read ahead mark (flag) */
 #define B_VMIO		0x20000000	/* VMIO flag */
 #define B_CLUSTER	0x40000000	/* pagein op, so swap() can count it */
 #define B_REMFREE	0x80000000	/* Delayed bremfree */
 
 #define PRINT_BUF_FLAGS "\20\40remfree\37cluster\36vmio\35ram\34managed" \
 	"\33paging\32infreecnt\31nocopy\30b23\27relbuf\26b21\25b20" \
 	"\24b19\23b18\22clusterok\21malloc\20nocache\17b14\16inval" \
 	"\15b12\14noreuse\13eintr\12done\11b8\10delwri" \
 	"\7validsuspwrt\6cache\5deferred\4direct\3async\2needcommit\1age"
 
 /*
  * These flags are kept in b_xflags.
  */
 #define	BX_VNDIRTY	0x00000001	/* On vnode dirty list */
 #define	BX_VNCLEAN	0x00000002	/* On vnode clean list */
 #define	BX_BKGRDWRITE	0x00000010	/* Do writes in background */
 #define BX_BKGRDMARKER	0x00000020	/* Mark buffer for splay tree */
 #define	BX_ALTDATA	0x00000040	/* Holds extended data */
 
 #define	PRINT_BUF_XFLAGS "\20\7altdata\6bkgrdmarker\5bkgrdwrite\2clean\1dirty"
 
 #define	NOOFFSET	(-1LL)		/* No buffer offset calculated yet */
 
 /*
  * These flags are kept in b_vflags.
  */
 #define	BV_SCANNED	0x00000001	/* VOP_FSYNC funcs mark written bufs */
 #define	BV_BKGRDINPROG	0x00000002	/* Background write in progress */
 #define	BV_BKGRDWAIT	0x00000004	/* Background write waiting */
 #define	BV_BKGRDERR	0x00000008	/* Error from background write */
 
 #define	PRINT_BUF_VFLAGS "\20\4bkgrderr\3bkgrdwait\2bkgrdinprog\1scanned"
 
 #ifdef _KERNEL
 /*
  * Buffer locking
  */
 extern const char *buf_wmesg;		/* Default buffer lock message */
 #define BUF_WMESG "bufwait"
 #include <sys/proc.h>			/* XXX for curthread */
 #include <sys/mutex.h>
 
 /*
  * Initialize a lock.
  */
 #define BUF_LOCKINIT(bp)						\
 	lockinit(&(bp)->b_lock, PRIBIO + 4, buf_wmesg, 0, 0)
 /*
  *
  * Get a lock sleeping non-interruptably until it becomes available.
  */
 #define	BUF_LOCK(bp, locktype, interlock)				\
 	_lockmgr_args_rw(&(bp)->b_lock, (locktype), (interlock),	\
 	    LK_WMESG_DEFAULT, LK_PRIO_DEFAULT, LK_TIMO_DEFAULT,		\
 	    LOCK_FILE, LOCK_LINE)
 
 /*
  * Get a lock sleeping with specified interruptably and timeout.
  */
 #define	BUF_TIMELOCK(bp, locktype, interlock, wmesg, catch, timo)	\
 	_lockmgr_args_rw(&(bp)->b_lock, (locktype) | LK_TIMELOCK,	\
 	    (interlock), (wmesg), (PRIBIO + 4) | (catch), (timo),	\
 	    LOCK_FILE, LOCK_LINE)
 
 /*
  * Release a lock. Only the acquiring process may free the lock unless
  * it has been handed off to biodone.
  */
 #define	BUF_UNLOCK(bp) do {						\
 	KASSERT(((bp)->b_flags & B_REMFREE) == 0,			\
 	    ("BUF_UNLOCK %p while B_REMFREE is still set.", (bp)));	\
 									\
 	(void)_lockmgr_args(&(bp)->b_lock, LK_RELEASE, NULL,		\
 	    LK_WMESG_DEFAULT, LK_PRIO_DEFAULT, LK_TIMO_DEFAULT,		\
 	    LOCK_FILE, LOCK_LINE);					\
 } while (0)
 
 /*
  * Check if a buffer lock is recursed.
  */
 #define	BUF_LOCKRECURSED(bp)						\
 	lockmgr_recursed(&(bp)->b_lock)
 
 /*
  * Check if a buffer lock is currently held.
  */
 #define	BUF_ISLOCKED(bp)						\
 	lockstatus(&(bp)->b_lock)
 /*
  * Free a buffer lock.
  */
 #define BUF_LOCKFREE(bp) 						\
 	lockdestroy(&(bp)->b_lock)
 
 /*
  * Print informations on a buffer lock.
  */
 #define BUF_LOCKPRINTINFO(bp) 						\
 	lockmgr_printinfo(&(bp)->b_lock)
 
 /*
  * Buffer lock assertions.
  */
 #if defined(INVARIANTS) && defined(INVARIANT_SUPPORT)
 #define	BUF_ASSERT_LOCKED(bp)						\
 	_lockmgr_assert(&(bp)->b_lock, KA_LOCKED, LOCK_FILE, LOCK_LINE)
 #define	BUF_ASSERT_SLOCKED(bp)						\
 	_lockmgr_assert(&(bp)->b_lock, KA_SLOCKED, LOCK_FILE, LOCK_LINE)
 #define	BUF_ASSERT_XLOCKED(bp)						\
 	_lockmgr_assert(&(bp)->b_lock, KA_XLOCKED, LOCK_FILE, LOCK_LINE)
 #define	BUF_ASSERT_UNLOCKED(bp)						\
 	_lockmgr_assert(&(bp)->b_lock, KA_UNLOCKED, LOCK_FILE, LOCK_LINE)
 #define	BUF_ASSERT_HELD(bp)
 #define	BUF_ASSERT_UNHELD(bp)
 #else
 #define	BUF_ASSERT_LOCKED(bp)
 #define	BUF_ASSERT_SLOCKED(bp)
 #define	BUF_ASSERT_XLOCKED(bp)
 #define	BUF_ASSERT_UNLOCKED(bp)
 #define	BUF_ASSERT_HELD(bp)
 #define	BUF_ASSERT_UNHELD(bp)
 #endif
 
 #ifdef _SYS_PROC_H_	/* Avoid #include <sys/proc.h> pollution */
 /*
  * When initiating asynchronous I/O, change ownership of the lock to the
  * kernel. Once done, the lock may legally released by biodone. The
  * original owning process can no longer acquire it recursively, but must
  * wait until the I/O is completed and the lock has been freed by biodone.
  */
 #define	BUF_KERNPROC(bp)						\
 	_lockmgr_disown(&(bp)->b_lock, LOCK_FILE, LOCK_LINE)
 #endif
 
-/*
- * Find out if the lock has waiters or not.
- */
-#define	BUF_LOCKWAITERS(bp)						\
-	lockmgr_waiters(&(bp)->b_lock)
-
 #endif /* _KERNEL */
 
 struct buf_queue_head {
 	TAILQ_HEAD(buf_queue, buf) queue;
 	daddr_t last_pblkno;
 	struct	buf *insert_point;
 	struct	buf *switch_point;
 };
 
 /*
  * This structure describes a clustered I/O. 
  */
 struct cluster_save {
 	long	bs_bcount;		/* Saved b_bcount. */
 	long	bs_bufsize;		/* Saved b_bufsize. */
 	int	bs_nchildren;		/* Number of associated buffers. */
 	struct buf **bs_children;	/* List of associated buffers. */
 };
 
 #ifdef _KERNEL
 
 static __inline int
 bwrite(struct buf *bp)
 {
 
 	KASSERT(bp->b_bufobj != NULL, ("bwrite: no bufobj bp=%p", bp));
 	KASSERT(bp->b_bufobj->bo_ops != NULL, ("bwrite: no bo_ops bp=%p", bp));
 	KASSERT(bp->b_bufobj->bo_ops->bop_write != NULL,
 	    ("bwrite: no bop_write bp=%p", bp));
 	return (BO_WRITE(bp->b_bufobj, bp));
 }
 
 static __inline void
 bstrategy(struct buf *bp)
 {
 
 	KASSERT(bp->b_bufobj != NULL, ("bstrategy: no bufobj bp=%p", bp));
 	KASSERT(bp->b_bufobj->bo_ops != NULL,
 	    ("bstrategy: no bo_ops bp=%p", bp));
 	KASSERT(bp->b_bufobj->bo_ops->bop_strategy != NULL,
 	    ("bstrategy: no bop_strategy bp=%p", bp));
 	BO_STRATEGY(bp->b_bufobj, bp);
 }
 
 static __inline void
 buf_start(struct buf *bp)
 {
 	if (bioops.io_start)
 		(*bioops.io_start)(bp);
 }
 
 static __inline void
 buf_complete(struct buf *bp)
 {
 	if (bioops.io_complete)
 		(*bioops.io_complete)(bp);
 }
 
 static __inline void
 buf_deallocate(struct buf *bp)
 {
 	if (bioops.io_deallocate)
 		(*bioops.io_deallocate)(bp);
 }
 
 static __inline int
 buf_countdeps(struct buf *bp, int i)
 {
 	if (bioops.io_countdeps)
 		return ((*bioops.io_countdeps)(bp, i));
 	else
 		return (0);
 }
 
 #endif /* _KERNEL */
 
 /*
  * Zero out the buffer's data area.
  */
 #define	clrbuf(bp) {							\
 	bzero((bp)->b_data, (u_int)(bp)->b_bcount);			\
 	(bp)->b_resid = 0;						\
 }
 
 /*
  * Flags for getblk's last parameter.
  */
 #define	GB_LOCK_NOWAIT	0x0001		/* Fail if we block on a buf lock. */
 #define	GB_NOCREAT	0x0002		/* Don't create a buf if not found. */
 #define	GB_NOWAIT_BD	0x0004		/* Do not wait for bufdaemon. */
 #define	GB_UNMAPPED	0x0008		/* Do not mmap buffer pages. */
 #define	GB_KVAALLOC	0x0010		/* But allocate KVA. */
 
 #ifdef _KERNEL
 extern int	nbuf;			/* The number of buffer headers */
 extern long	maxswzone;		/* Max KVA for swap structures */
 extern long	maxbcache;		/* Max KVA for buffer cache */
 extern long	runningbufspace;
 extern long	hibufspace;
 extern int	dirtybufthresh;
 extern int	bdwriteskip;
 extern int	dirtybufferflushes;
 extern int	altbufferflushes;
 extern int	nswbuf;			/* Number of swap I/O buffer headers. */
 extern int	cluster_pbuf_freecnt;	/* Number of pbufs for clusters */
 extern int	vnode_pbuf_freecnt;	/* Number of pbufs for vnode pager */
 extern int	vnode_async_pbuf_freecnt; /* Number of pbufs for vnode pager,
 					     asynchronous reads */
 extern caddr_t	unmapped_buf;	/* Data address for unmapped buffers. */
 
 static inline int
 buf_mapped(struct buf *bp)
 {
 
 	return (bp->b_data != unmapped_buf);
 }
 
 void	runningbufwakeup(struct buf *);
 void	waitrunningbufspace(void);
 caddr_t	kern_vfs_bio_buffer_alloc(caddr_t v, long physmem_est);
 void	bufinit(void);
 void	bufshutdown(int);
 void	bdata2bio(struct buf *bp, struct bio *bip);
 void	bwillwrite(void);
 int	buf_dirty_count_severe(void);
 void	bremfree(struct buf *);
 void	bremfreef(struct buf *);	/* XXX Force bremfree, only for nfs. */
 #define bread(vp, blkno, size, cred, bpp) \
 	    breadn_flags(vp, blkno, size, NULL, NULL, 0, cred, 0, bpp)
 #define bread_gb(vp, blkno, size, cred, gbflags, bpp) \
 	    breadn_flags(vp, blkno, size, NULL, NULL, 0, cred, \
 		gbflags, bpp)
 #define breadn(vp, blkno, size, rablkno, rabsize, cnt, cred, bpp) \
 	    breadn_flags(vp, blkno, size, rablkno, rabsize, cnt, cred, 0, bpp)
 int	breadn_flags(struct vnode *, daddr_t, int, daddr_t *, int *, int,
 	    struct ucred *, int, struct buf **);
 void	breada(struct vnode *, daddr_t *, int *, int, struct ucred *);
 void	bdwrite(struct buf *);
 void	bawrite(struct buf *);
 void	babarrierwrite(struct buf *);
 int	bbarrierwrite(struct buf *);
 void	bdirty(struct buf *);
 void	bundirty(struct buf *);
 void	bufstrategy(struct bufobj *, struct buf *);
 void	brelse(struct buf *);
 void	bqrelse(struct buf *);
 int	vfs_bio_awrite(struct buf *);
 void	vfs_drain_busy_pages(struct buf *bp);
 struct buf *     getpbuf(int *);
 struct buf *incore(struct bufobj *, daddr_t);
 struct buf *gbincore(struct bufobj *, daddr_t);
 struct buf *getblk(struct vnode *, daddr_t, int, int, int, int);
 struct buf *geteblk(int, int);
 int	bufwait(struct buf *);
 int	bufwrite(struct buf *);
 void	bufdone(struct buf *);
 void	bufdone_finish(struct buf *);
 void	bd_speedup(void);
 
 int	cluster_read(struct vnode *, u_quad_t, daddr_t, long,
 	    struct ucred *, long, int, int, struct buf **);
 int	cluster_wbuild(struct vnode *, long, daddr_t, int, int);
 void	cluster_write(struct vnode *, struct buf *, u_quad_t, int, int);
 void	vfs_bio_bzero_buf(struct buf *bp, int base, int size);
 void	vfs_bio_set_valid(struct buf *, int base, int size);
 void	vfs_bio_clrbuf(struct buf *);
 void	vfs_busy_pages(struct buf *, int clear_modify);
 void	vfs_unbusy_pages(struct buf *);
 int	vmapbuf(struct buf *, int);
 void	vunmapbuf(struct buf *);
 void	relpbuf(struct buf *, int *);
 void	brelvp(struct buf *);
 void	bgetvp(struct vnode *, struct buf *);
 void	pbgetbo(struct bufobj *bo, struct buf *bp);
 void	pbgetvp(struct vnode *, struct buf *);
 void	pbrelbo(struct buf *);
 void	pbrelvp(struct buf *);
 int	allocbuf(struct buf *bp, int size);
 void	reassignbuf(struct buf *);
 struct	buf *trypbuf(int *);
 void	bwait(struct buf *, u_char, const char *);
 void	bdone(struct buf *);
 void	bpin(struct buf *);
 void	bunpin(struct buf *);
 void 	bunpin_wait(struct buf *);
 
 #endif /* _KERNEL */
 
 #endif /* !_SYS_BUF_H_ */
Index: user/alc/PQ_LAUNDRY/sys/sys/bus.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/sys/bus.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/sys/bus.h	(revision 303775)
@@ -1,947 +1,947 @@
 /*-
  * Copyright (c) 1997,1998,2003 Doug Rabson
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef _SYS_BUS_H_
 #define _SYS_BUS_H_
 
 #include <machine/_limits.h>
 #include <machine/_bus.h>
 #include <sys/_bus_dma.h>
 #include <sys/ioccom.h>
 
 /**
  * @defgroup NEWBUS newbus - a generic framework for managing devices
  * @{
  */
 
 /**
  * @brief Interface information structure.
  */
 struct u_businfo {
 	int	ub_version;		/**< @brief interface version */
 #define BUS_USER_VERSION	1
 	int	ub_generation;		/**< @brief generation count */
 };
 
 /**
  * @brief State of the device.
  */
 typedef enum device_state {
 	DS_NOTPRESENT = 10,		/**< @brief not probed or probe failed */
 	DS_ALIVE = 20,			/**< @brief probe succeeded */
 	DS_ATTACHING = 25,		/**< @brief currently attaching */
 	DS_ATTACHED = 30,		/**< @brief attach method called */
 	DS_BUSY = 40			/**< @brief device is open */
 } device_state_t;
 
 /**
  * @brief Device information exported to userspace.
  */
 struct u_device {
 	uintptr_t	dv_handle;
 	uintptr_t	dv_parent;
 
 	char		dv_name[32];		/**< @brief Name of device in tree. */
 	char		dv_desc[32];		/**< @brief Driver description */
 	char		dv_drivername[32];	/**< @brief Driver name */
 	char		dv_pnpinfo[128];	/**< @brief Plug and play info */
 	char		dv_location[128];	/**< @brief Where is the device? */
 	uint32_t	dv_devflags;		/**< @brief API Flags for device */
 	uint16_t	dv_flags;		/**< @brief flags for dev state */
 	device_state_t	dv_state;		/**< @brief State of attachment */
 	/* XXX more driver info? */
 };
 
 /* Flags exported via dv_flags. */
 #define	DF_ENABLED	0x01		/* device should be probed/attached */
 #define	DF_FIXEDCLASS	0x02		/* devclass specified at create time */
 #define	DF_WILDCARD	0x04		/* unit was originally wildcard */
 #define	DF_DESCMALLOCED	0x08		/* description was malloced */
 #define	DF_QUIET	0x10		/* don't print verbose attach message */
 #define	DF_DONENOMATCH	0x20		/* don't execute DEVICE_NOMATCH again */
 #define	DF_EXTERNALSOFTC 0x40		/* softc not allocated by us */
 #define	DF_REBID	0x80		/* Can rebid after attach */
 #define	DF_SUSPENDED	0x100		/* Device is suspended. */
 
 /**
  * @brief Device request structure used for ioctl's.
  *
  * Used for ioctl's on /dev/devctl2.  All device ioctl's
  * must have parameter definitions which begin with dr_name.
  */
 struct devreq_buffer {
 	void	*buffer;
 	size_t	length;
 };
 
 struct devreq {
 	char		dr_name[128];
 	int		dr_flags;		/* request-specific flags */
 	union {
 		struct devreq_buffer dru_buffer;
 		void	*dru_data;
 	} dr_dru;
 #define	dr_buffer	dr_dru.dru_buffer	/* variable-sized buffer */
 #define	dr_data		dr_dru.dru_data		/* fixed-size buffer */
 };
 
 #define	DEV_ATTACH	_IOW('D', 1, struct devreq)
 #define	DEV_DETACH	_IOW('D', 2, struct devreq)
 #define	DEV_ENABLE	_IOW('D', 3, struct devreq)
 #define	DEV_DISABLE	_IOW('D', 4, struct devreq)
 #define	DEV_SUSPEND	_IOW('D', 5, struct devreq)
 #define	DEV_RESUME	_IOW('D', 6, struct devreq)
 #define	DEV_SET_DRIVER	_IOW('D', 7, struct devreq)
 #define	DEV_RESCAN	_IOW('D', 9, struct devreq)
 #define	DEV_DELETE	_IOW('D', 10, struct devreq)
 
 /* Flags for DEV_DETACH and DEV_DISABLE. */
 #define	DEVF_FORCE_DETACH	0x0000001
 
 /* Flags for DEV_SET_DRIVER. */
 #define	DEVF_SET_DRIVER_DETACH	0x0000001	/* Detach existing driver. */
 
 /* Flags for DEV_DELETE. */
 #define	DEVF_FORCE_DELETE	0x0000001
 
 #ifdef _KERNEL
 
 #include <sys/eventhandler.h>
 #include <sys/kobj.h>
 
 /**
  * devctl hooks.  Typically one should use the devctl_notify
  * hook to send the message.  However, devctl_queue_data is also
  * included in case devctl_notify isn't sufficiently general.
  */
 boolean_t devctl_process_running(void);
 void devctl_notify_f(const char *__system, const char *__subsystem,
     const char *__type, const char *__data, int __flags);
 void devctl_notify(const char *__system, const char *__subsystem,
     const char *__type, const char *__data);
 void devctl_queue_data_f(char *__data, int __flags);
 void devctl_queue_data(char *__data);
 void devctl_safe_quote(char *__dst, const char *__src, size_t len);
 
 /**
  * Device name parsers.  Hook to allow device enumerators to map
  * scheme-specific names to a device.
  */
 typedef void (*dev_lookup_fn)(void *arg, const char *name,
     device_t *result);
 EVENTHANDLER_DECLARE(dev_lookup, dev_lookup_fn);
 
 /**
  * @brief A device driver (included mainly for compatibility with
  * FreeBSD 4.x).
  */
 typedef struct kobj_class	driver_t;
 
 /**
  * @brief A device class
  *
  * The devclass object has two main functions in the system. The first
  * is to manage the allocation of unit numbers for device instances
  * and the second is to hold the list of device drivers for a
  * particular bus type. Each devclass has a name and there cannot be
  * two devclasses with the same name. This ensures that unique unit
  * numbers are allocated to device instances.
  *
  * Drivers that support several different bus attachments (e.g. isa,
  * pci, pccard) should all use the same devclass to ensure that unit
  * numbers do not conflict.
  *
  * Each devclass may also have a parent devclass. This is used when
  * searching for device drivers to allow a form of inheritance. When
  * matching drivers with devices, first the driver list of the parent
  * device's devclass is searched. If no driver is found in that list,
  * the search continues in the parent devclass (if any).
  */
 typedef struct devclass		*devclass_t;
 
 /**
  * @brief A device method
  */
 #define device_method_t		kobj_method_t
 
 /**
  * @brief Driver interrupt filter return values
  *
  * If a driver provides an interrupt filter routine it must return an
  * integer consisting of oring together zero or more of the following
  * flags:
  *
  *	FILTER_STRAY	- this device did not trigger the interrupt
  *	FILTER_HANDLED	- the interrupt has been fully handled and can be EOId
  *	FILTER_SCHEDULE_THREAD - the threaded interrupt handler should be
  *			  scheduled to execute
  *
  * If the driver does not provide a filter, then the interrupt code will
  * act is if the filter had returned FILTER_SCHEDULE_THREAD.  Note that it
  * is illegal to specify any other flag with FILTER_STRAY and that it is
  * illegal to not specify either of FILTER_HANDLED or FILTER_SCHEDULE_THREAD
  * if FILTER_STRAY is not specified.
  */
 #define	FILTER_STRAY		0x01
 #define	FILTER_HANDLED		0x02
 #define	FILTER_SCHEDULE_THREAD	0x04
 
 /**
  * @brief Driver interrupt service routines
  *
  * The filter routine is run in primary interrupt context and may not
  * block or use regular mutexes.  It may only use spin mutexes for
  * synchronization.  The filter may either completely handle the
  * interrupt or it may perform some of the work and defer more
  * expensive work to the regular interrupt handler.  If a filter
  * routine is not registered by the driver, then the regular interrupt
  * handler is always used to handle interrupts from this device.
  *
  * The regular interrupt handler executes in its own thread context
  * and may use regular mutexes.  However, it is prohibited from
  * sleeping on a sleep queue.
  */
 typedef int driver_filter_t(void*);
 typedef void driver_intr_t(void*);
 
 /**
  * @brief Interrupt type bits.
  * 
  * These flags are used both by newbus interrupt
  * registration (nexus.c) and also in struct intrec, which defines
  * interrupt properties.
  *
  * XXX We should probably revisit this and remove the vestiges of the
  * spls implicit in names like INTR_TYPE_TTY. In the meantime, don't
  * confuse things by renaming them (Grog, 18 July 2000).
  *
  * Buses which do interrupt remapping will want to change their type
  * to reflect what sort of devices are underneath.
  */
 enum intr_type {
 	INTR_TYPE_TTY = 1,
 	INTR_TYPE_BIO = 2,
 	INTR_TYPE_NET = 4,
 	INTR_TYPE_CAM = 8,
 	INTR_TYPE_MISC = 16,
 	INTR_TYPE_CLK = 32,
 	INTR_TYPE_AV = 64,
 	INTR_EXCL = 256,		/* exclusive interrupt */
 	INTR_MPSAFE = 512,		/* this interrupt is SMP safe */
 	INTR_ENTROPY = 1024,		/* this interrupt provides entropy */
 	INTR_MD1 = 4096,		/* flag reserved for MD use */
 	INTR_MD2 = 8192,		/* flag reserved for MD use */
 	INTR_MD3 = 16384,		/* flag reserved for MD use */
 	INTR_MD4 = 32768		/* flag reserved for MD use */
 };
 
 enum intr_trigger {
 	INTR_TRIGGER_CONFORM = 0,
 	INTR_TRIGGER_EDGE = 1,
 	INTR_TRIGGER_LEVEL = 2
 };
 
 enum intr_polarity {
 	INTR_POLARITY_CONFORM = 0,
 	INTR_POLARITY_HIGH = 1,
 	INTR_POLARITY_LOW = 2
 };
 
 enum intr_map_data_type {
 	INTR_MAP_DATA_ACPI,
 	INTR_MAP_DATA_FDT,
 	INTR_MAP_DATA_GPIO,
 };
 
 struct intr_map_data {
 	enum intr_map_data_type	type;
 	void (*destruct)(struct intr_map_data *);
 };
 
 /**
  * CPU sets supported by bus_get_cpus().  Note that not all sets may be
  * supported for a given device.  If a request is not supported by a
  * device (or its parents), then bus_get_cpus() will fail with EINVAL.
  */
 enum cpu_sets {
 	LOCAL_CPUS = 0,
 	INTR_CPUS
 };
 
 typedef int (*devop_t)(void);
 
 /**
  * @brief This structure is deprecated.
  *
  * Use the kobj(9) macro DEFINE_CLASS to
  * declare classes which implement device drivers.
  */
 struct driver {
 	KOBJ_CLASS_FIELDS;
 };
 
 /**
  * @brief A resource mapping.
  */
 struct resource_map {
 	bus_space_tag_t r_bustag;
 	bus_space_handle_t r_bushandle;
 	bus_size_t r_size;
 	void	*r_vaddr;
 };
 	
 /**
  * @brief Optional properties of a resource mapping request.
  */
 struct resource_map_request {
 	size_t	size;
 	rman_res_t offset;
 	rman_res_t length;
 	vm_memattr_t memattr;
 };
 
 void	resource_init_map_request_impl(struct resource_map_request *_args,
 	    size_t _sz);
 #define	resource_init_map_request(rmr) 					\
 	resource_init_map_request_impl((rmr), sizeof(*(rmr)))
 
 /*
  * Definitions for drivers which need to keep simple lists of resources
  * for their child devices.
  */
 struct	resource;
 
 /**
  * @brief An entry for a single resource in a resource list.
  */
 struct resource_list_entry {
 	STAILQ_ENTRY(resource_list_entry) link;
 	int	type;			/**< @brief type argument to alloc_resource */
 	int	rid;			/**< @brief resource identifier */
 	int	flags;			/**< @brief resource flags */
 	struct	resource *res;		/**< @brief the real resource when allocated */
 	rman_res_t	start;		/**< @brief start of resource range */
 	rman_res_t	end;		/**< @brief end of resource range */
 	rman_res_t	count;			/**< @brief count within range */
 };
 STAILQ_HEAD(resource_list, resource_list_entry);
 
 #define	RLE_RESERVED		0x0001	/* Reserved by the parent bus. */
 #define	RLE_ALLOCATED		0x0002	/* Reserved resource is allocated. */
 #define	RLE_PREFETCH		0x0004	/* Resource is a prefetch range. */
 
 void	resource_list_init(struct resource_list *rl);
 void	resource_list_free(struct resource_list *rl);
 struct resource_list_entry *
 	resource_list_add(struct resource_list *rl,
 			  int type, int rid,
 			  rman_res_t start, rman_res_t end, rman_res_t count);
 int	resource_list_add_next(struct resource_list *rl,
 			  int type,
 			  rman_res_t start, rman_res_t end, rman_res_t count);
 int	resource_list_busy(struct resource_list *rl,
 			   int type, int rid);
 int	resource_list_reserved(struct resource_list *rl, int type, int rid);
 struct resource_list_entry*
 	resource_list_find(struct resource_list *rl,
 			   int type, int rid);
 void	resource_list_delete(struct resource_list *rl,
 			     int type, int rid);
 struct resource *
 	resource_list_alloc(struct resource_list *rl,
 			    device_t bus, device_t child,
 			    int type, int *rid,
 			    rman_res_t start, rman_res_t end,
 			    rman_res_t count, u_int flags);
 int	resource_list_release(struct resource_list *rl,
 			      device_t bus, device_t child,
 			      int type, int rid, struct resource *res);
 int	resource_list_release_active(struct resource_list *rl,
 				     device_t bus, device_t child,
 				     int type);
 struct resource *
 	resource_list_reserve(struct resource_list *rl,
 			      device_t bus, device_t child,
 			      int type, int *rid,
 			      rman_res_t start, rman_res_t end,
 			      rman_res_t count, u_int flags);
 int	resource_list_unreserve(struct resource_list *rl,
 				device_t bus, device_t child,
 				int type, int rid);
 void	resource_list_purge(struct resource_list *rl);
 int	resource_list_print_type(struct resource_list *rl,
 				 const char *name, int type,
 				 const char *format);
 
 /*
  * The root bus, to which all top-level busses are attached.
  */
 extern device_t root_bus;
 extern devclass_t root_devclass;
 void	root_bus_configure(void);
 
 /*
  * Useful functions for implementing busses.
  */
 
 int	bus_generic_activate_resource(device_t dev, device_t child, int type,
 				      int rid, struct resource *r);
 device_t
 	bus_generic_add_child(device_t dev, u_int order, const char *name,
 			      int unit);
 int	bus_generic_adjust_resource(device_t bus, device_t child, int type,
 				    struct resource *r, rman_res_t start,
 				    rman_res_t end);
 struct resource *
 	bus_generic_alloc_resource(device_t bus, device_t child, int type,
 				   int *rid, rman_res_t start, rman_res_t end,
 				   rman_res_t count, u_int flags);
 int	bus_generic_attach(device_t dev);
 int	bus_generic_bind_intr(device_t dev, device_t child,
 			      struct resource *irq, int cpu);
 int	bus_generic_child_present(device_t dev, device_t child);
 int	bus_generic_config_intr(device_t, int, enum intr_trigger,
 				enum intr_polarity);
 int	bus_generic_describe_intr(device_t dev, device_t child,
 				  struct resource *irq, void *cookie,
 				  const char *descr);
 int	bus_generic_deactivate_resource(device_t dev, device_t child, int type,
 					int rid, struct resource *r);
 int	bus_generic_detach(device_t dev);
 void	bus_generic_driver_added(device_t dev, driver_t *driver);
 int	bus_generic_get_cpus(device_t dev, device_t child, enum cpu_sets op,
 			     size_t setsize, struct _cpuset *cpuset);
 bus_dma_tag_t
 	bus_generic_get_dma_tag(device_t dev, device_t child);
 bus_space_tag_t
 	bus_generic_get_bus_tag(device_t dev, device_t child);
 int	bus_generic_get_domain(device_t dev, device_t child, int *domain);
 struct resource_list *
 	bus_generic_get_resource_list (device_t, device_t);
 int	bus_generic_map_resource(device_t dev, device_t child, int type,
 				 struct resource *r,
 				 struct resource_map_request *args,
 				 struct resource_map *map);
 void	bus_generic_new_pass(device_t dev);
 int	bus_print_child_header(device_t dev, device_t child);
 int	bus_print_child_domain(device_t dev, device_t child);
 int	bus_print_child_footer(device_t dev, device_t child);
 int	bus_generic_print_child(device_t dev, device_t child);
 int	bus_generic_probe(device_t dev);
 int	bus_generic_read_ivar(device_t dev, device_t child, int which,
 			      uintptr_t *result);
 int	bus_generic_release_resource(device_t bus, device_t child,
 				     int type, int rid, struct resource *r);
 int	bus_generic_resume(device_t dev);
 int	bus_generic_resume_child(device_t dev, device_t child);
 int	bus_generic_map_intr(device_t dev, device_t child, int *rid,
 			      rman_res_t *start, rman_res_t *end,
 			      rman_res_t *count, struct intr_map_data **imd);
 int	bus_generic_setup_intr(device_t dev, device_t child,
 			       struct resource *irq, int flags,
 			       driver_filter_t *filter, driver_intr_t *intr, 
 			       void *arg, void **cookiep);
 
 struct resource *
 	bus_generic_rl_alloc_resource (device_t, device_t, int, int *,
 				       rman_res_t, rman_res_t, rman_res_t, u_int);
 void	bus_generic_rl_delete_resource (device_t, device_t, int, int);
 int	bus_generic_rl_get_resource (device_t, device_t, int, int, rman_res_t *,
 				     rman_res_t *);
 int	bus_generic_rl_set_resource (device_t, device_t, int, int, rman_res_t,
 				     rman_res_t);
 int	bus_generic_rl_release_resource (device_t, device_t, int, int,
 					 struct resource *);
 
 int	bus_generic_shutdown(device_t dev);
 int	bus_generic_suspend(device_t dev);
 int	bus_generic_suspend_child(device_t dev, device_t child);
 int	bus_generic_teardown_intr(device_t dev, device_t child,
 				  struct resource *irq, void *cookie);
 int	bus_generic_unmap_resource(device_t dev, device_t child, int type,
 				   struct resource *r,
 				   struct resource_map *map);
 int	bus_generic_write_ivar(device_t dev, device_t child, int which,
 			       uintptr_t value);
 int	bus_null_rescan(device_t dev);
 
 /*
  * Wrapper functions for the BUS_*_RESOURCE methods to make client code
  * a little simpler.
  */
 
 struct resource_spec {
 	int	type;
 	int	rid;
 	int	flags;
 };
 
 int	bus_alloc_resources(device_t dev, struct resource_spec *rs,
 			    struct resource **res);
 void	bus_release_resources(device_t dev, const struct resource_spec *rs,
 			      struct resource **res);
 
 int	bus_adjust_resource(device_t child, int type, struct resource *r,
 			    rman_res_t start, rman_res_t end);
 struct	resource *bus_alloc_resource(device_t dev, int type, int *rid,
 				     rman_res_t start, rman_res_t end,
 				     rman_res_t count, u_int flags);
 int	bus_activate_resource(device_t dev, int type, int rid,
 			      struct resource *r);
 int	bus_deactivate_resource(device_t dev, int type, int rid,
 				struct resource *r);
 int	bus_map_resource(device_t dev, int type, struct resource *r,
 			 struct resource_map_request *args,
 			 struct resource_map *map);
 int	bus_unmap_resource(device_t dev, int type, struct resource *r,
 			   struct resource_map *map);
 int	bus_get_cpus(device_t dev, enum cpu_sets op, size_t setsize,
 		     struct _cpuset *cpuset);
 bus_dma_tag_t bus_get_dma_tag(device_t dev);
 bus_space_tag_t bus_get_bus_tag(device_t dev);
 int	bus_get_domain(device_t dev, int *domain);
 int	bus_release_resource(device_t dev, int type, int rid,
 			     struct resource *r);
 int	bus_free_resource(device_t dev, int type, struct resource *r);
 int	bus_setup_intr(device_t dev, struct resource *r, int flags,
 		       driver_filter_t filter, driver_intr_t handler, 
 		       void *arg, void **cookiep);
 int	bus_teardown_intr(device_t dev, struct resource *r, void *cookie);
 int	bus_bind_intr(device_t dev, struct resource *r, int cpu);
 int	bus_describe_intr(device_t dev, struct resource *irq, void *cookie,
-			  const char *fmt, ...);
+			  const char *fmt, ...) __printflike(4, 5);
 int	bus_set_resource(device_t dev, int type, int rid,
 			 rman_res_t start, rman_res_t count);
 int	bus_get_resource(device_t dev, int type, int rid,
 			 rman_res_t *startp, rman_res_t *countp);
 rman_res_t	bus_get_resource_start(device_t dev, int type, int rid);
 rman_res_t	bus_get_resource_count(device_t dev, int type, int rid);
 void	bus_delete_resource(device_t dev, int type, int rid);
 int	bus_child_present(device_t child);
 int	bus_child_pnpinfo_str(device_t child, char *buf, size_t buflen);
 int	bus_child_location_str(device_t child, char *buf, size_t buflen);
 void	bus_enumerate_hinted_children(device_t bus);
 
 static __inline struct resource *
 bus_alloc_resource_any(device_t dev, int type, int *rid, u_int flags)
 {
 	return (bus_alloc_resource(dev, type, rid, 0, ~0, 1, flags));
 }
 
 static __inline struct resource *
 bus_alloc_resource_anywhere(device_t dev, int type, int *rid,
     rman_res_t count, u_int flags)
 {
 	return (bus_alloc_resource(dev, type, rid, 0, ~0, count, flags));
 }
 
 /*
  * Access functions for device.
  */
 device_t	device_add_child(device_t dev, const char *name, int unit);
 device_t	device_add_child_ordered(device_t dev, u_int order,
 					 const char *name, int unit);
 void	device_busy(device_t dev);
 int	device_delete_child(device_t dev, device_t child);
 int	device_delete_children(device_t dev);
 int	device_attach(device_t dev);
 int	device_detach(device_t dev);
 void	device_disable(device_t dev);
 void	device_enable(device_t dev);
 device_t	device_find_child(device_t dev, const char *classname,
 				  int unit);
 const char	*device_get_desc(device_t dev);
 devclass_t	device_get_devclass(device_t dev);
 driver_t	*device_get_driver(device_t dev);
 u_int32_t	device_get_flags(device_t dev);
 device_t	device_get_parent(device_t dev);
 int	device_get_children(device_t dev, device_t **listp, int *countp);
 void	*device_get_ivars(device_t dev);
 void	device_set_ivars(device_t dev, void *ivars);
 const	char *device_get_name(device_t dev);
 const	char *device_get_nameunit(device_t dev);
 void	*device_get_softc(device_t dev);
 device_state_t	device_get_state(device_t dev);
 int	device_get_unit(device_t dev);
 struct sysctl_ctx_list *device_get_sysctl_ctx(device_t dev);
 struct sysctl_oid *device_get_sysctl_tree(device_t dev);
 int	device_is_alive(device_t dev);	/* did probe succeed? */
 int	device_is_attached(device_t dev);	/* did attach succeed? */
 int	device_is_enabled(device_t dev);
 int	device_is_suspended(device_t dev);
 int	device_is_quiet(device_t dev);
 device_t device_lookup_by_name(const char *name);
 int	device_print_prettyname(device_t dev);
 int	device_printf(device_t dev, const char *, ...) __printflike(2, 3);
 int	device_probe(device_t dev);
 int	device_probe_and_attach(device_t dev);
 int	device_probe_child(device_t bus, device_t dev);
 int	device_quiesce(device_t dev);
 void	device_quiet(device_t dev);
 void	device_set_desc(device_t dev, const char* desc);
 void	device_set_desc_copy(device_t dev, const char* desc);
 int	device_set_devclass(device_t dev, const char *classname);
 int	device_set_devclass_fixed(device_t dev, const char *classname);
 int	device_set_driver(device_t dev, driver_t *driver);
 void	device_set_flags(device_t dev, u_int32_t flags);
 void	device_set_softc(device_t dev, void *softc);
 void	device_free_softc(void *softc);
 void	device_claim_softc(device_t dev);
 int	device_set_unit(device_t dev, int unit);	/* XXX DONT USE XXX */
 int	device_shutdown(device_t dev);
 void	device_unbusy(device_t dev);
 void	device_verbose(device_t dev);
 
 /*
  * Access functions for devclass.
  */
 int		devclass_add_driver(devclass_t dc, driver_t *driver,
 				    int pass, devclass_t *dcp);
 devclass_t	devclass_create(const char *classname);
 int		devclass_delete_driver(devclass_t busclass, driver_t *driver);
 devclass_t	devclass_find(const char *classname);
 const char	*devclass_get_name(devclass_t dc);
 device_t	devclass_get_device(devclass_t dc, int unit);
 void	*devclass_get_softc(devclass_t dc, int unit);
 int	devclass_get_devices(devclass_t dc, device_t **listp, int *countp);
 int	devclass_get_drivers(devclass_t dc, driver_t ***listp, int *countp);
 int	devclass_get_count(devclass_t dc);
 int	devclass_get_maxunit(devclass_t dc);
 int	devclass_find_free_unit(devclass_t dc, int unit);
 void	devclass_set_parent(devclass_t dc, devclass_t pdc);
 devclass_t	devclass_get_parent(devclass_t dc);
 struct sysctl_ctx_list *devclass_get_sysctl_ctx(devclass_t dc);
 struct sysctl_oid *devclass_get_sysctl_tree(devclass_t dc);
 
 /*
  * Access functions for device resources.
  */
 
 int	resource_int_value(const char *name, int unit, const char *resname,
 			   int *result);
 int	resource_long_value(const char *name, int unit, const char *resname,
 			    long *result);
 int	resource_string_value(const char *name, int unit, const char *resname,
 			      const char **result);
 int	resource_disabled(const char *name, int unit);
 int	resource_find_match(int *anchor, const char **name, int *unit,
 			    const char *resname, const char *value);
 int	resource_find_dev(int *anchor, const char *name, int *unit,
 			  const char *resname, const char *value);
 int	resource_set_int(const char *name, int unit, const char *resname,
 			 int value);
 int	resource_set_long(const char *name, int unit, const char *resname,
 			  long value);
 int	resource_set_string(const char *name, int unit, const char *resname,
 			    const char *value);
 int	resource_unset_value(const char *name, int unit, const char *resname);
 
 /*
  * Functions for maintaining and checking consistency of
  * bus information exported to userspace.
  */
 int	bus_data_generation_check(int generation);
 void	bus_data_generation_update(void);
 
 /**
  * Some convenience defines for probe routines to return.  These are just
  * suggested values, and there's nothing magical about them.
  * BUS_PROBE_SPECIFIC is for devices that cannot be reprobed, and that no
  * possible other driver may exist (typically legacy drivers who don't fallow
  * all the rules, or special needs drivers).  BUS_PROBE_VENDOR is the
  * suggested value that vendor supplied drivers use.  This is for source or
  * binary drivers that are not yet integrated into the FreeBSD tree.  Its use
  * in the base OS is prohibited.  BUS_PROBE_DEFAULT is the normal return value
  * for drivers to use.  It is intended that nearly all of the drivers in the
  * tree should return this value.  BUS_PROBE_LOW_PRIORITY are for drivers that
  * have special requirements like when there are two drivers that support
  * overlapping series of hardware devices.  In this case the one that supports
  * the older part of the line would return this value, while the one that
  * supports the newer ones would return BUS_PROBE_DEFAULT.  BUS_PROBE_GENERIC
  * is for drivers that wish to have a generic form and a specialized form,
  * like is done with the pci bus and the acpi pci bus.  BUS_PROBE_HOOVER is
  * for those busses that implement a generic device place-holder for devices on
  * the bus that have no more specific driver for them (aka ugen).
  * BUS_PROBE_NOWILDCARD or lower means that the device isn't really bidding
  * for a device node, but accepts only devices that its parent has told it
  * use this driver.
  */
 #define BUS_PROBE_SPECIFIC	0	/* Only I can use this device */
 #define BUS_PROBE_VENDOR	(-10)	/* Vendor supplied driver */
 #define BUS_PROBE_DEFAULT	(-20)	/* Base OS default driver */
 #define BUS_PROBE_LOW_PRIORITY	(-40)	/* Older, less desirable drivers */
 #define BUS_PROBE_GENERIC	(-100)	/* generic driver for dev */
 #define BUS_PROBE_HOOVER	(-1000000) /* Driver for any dev on bus */
 #define BUS_PROBE_NOWILDCARD	(-2000000000) /* No wildcard device matches */
 
 /**
  * During boot, the device tree is scanned multiple times.  Each scan,
  * or pass, drivers may be attached to devices.  Each driver
  * attachment is assigned a pass number.  Drivers may only probe and
  * attach to devices if their pass number is less than or equal to the
  * current system-wide pass number.  The default pass is the last pass
  * and is used by most drivers.  Drivers needed by the scheduler are
  * probed in earlier passes.
  */
 #define	BUS_PASS_ROOT		0	/* Used to attach root0. */
 #define	BUS_PASS_BUS		10	/* Busses and bridges. */
 #define	BUS_PASS_CPU		20	/* CPU devices. */
 #define	BUS_PASS_RESOURCE	30	/* Resource discovery. */
 #define	BUS_PASS_INTERRUPT	40	/* Interrupt controllers. */
 #define	BUS_PASS_TIMER		50	/* Timers and clocks. */
 #define	BUS_PASS_SCHEDULER	60	/* Start scheduler. */
 #define	BUS_PASS_DEFAULT	__INT_MAX /* Everything else. */
 
 #define	BUS_PASS_ORDER_FIRST	0
 #define	BUS_PASS_ORDER_EARLY	2
 #define	BUS_PASS_ORDER_MIDDLE	5
 #define	BUS_PASS_ORDER_LATE	7
 #define	BUS_PASS_ORDER_LAST	9
 
 extern int bus_current_pass;
 
 void	bus_set_pass(int pass);
 
 /**
  * Shorthands for constructing method tables.
  */
 #define	DEVMETHOD	KOBJMETHOD
 #define	DEVMETHOD_END	KOBJMETHOD_END
 
 /*
  * Some common device interfaces.
  */
 #include "device_if.h"
 #include "bus_if.h"
 
 struct	module;
 
 int	driver_module_handler(struct module *, int, void *);
 
 /**
  * Module support for automatically adding drivers to busses.
  */
 struct driver_module_data {
 	int		(*dmd_chainevh)(struct module *, int, void *);
 	void		*dmd_chainarg;
 	const char	*dmd_busname;
 	kobj_class_t	dmd_driver;
 	devclass_t	*dmd_devclass;
 	int		dmd_pass;
 };
 
 #define	EARLY_DRIVER_MODULE_ORDERED(name, busname, driver, devclass,	\
     evh, arg, order, pass)						\
 									\
 static struct driver_module_data name##_##busname##_driver_mod = {	\
 	evh, arg,							\
 	#busname,							\
 	(kobj_class_t) &driver,						\
 	&devclass,							\
 	pass								\
 };									\
 									\
 static moduledata_t name##_##busname##_mod = {				\
 	#busname "/" #name,						\
 	driver_module_handler,						\
 	&name##_##busname##_driver_mod					\
 };									\
 DECLARE_MODULE(name##_##busname, name##_##busname##_mod,		\
 	       SI_SUB_DRIVERS, order)
 
 #define	EARLY_DRIVER_MODULE(name, busname, driver, devclass, evh, arg, pass) \
 	EARLY_DRIVER_MODULE_ORDERED(name, busname, driver, devclass,	\
 	    evh, arg, SI_ORDER_MIDDLE, pass)
 
 #define	DRIVER_MODULE_ORDERED(name, busname, driver, devclass, evh, arg,\
     order)								\
 	EARLY_DRIVER_MODULE_ORDERED(name, busname, driver, devclass,	\
 	    evh, arg, order, BUS_PASS_DEFAULT)
 
 #define	DRIVER_MODULE(name, busname, driver, devclass, evh, arg)	\
 	EARLY_DRIVER_MODULE(name, busname, driver, devclass, evh, arg,	\
 	    BUS_PASS_DEFAULT)
 
 /**
  * Generic ivar accessor generation macros for bus drivers
  */
 #define __BUS_ACCESSOR(varp, var, ivarp, ivar, type)			\
 									\
 static __inline type varp ## _get_ ## var(device_t dev)			\
 {									\
 	uintptr_t v;							\
 	BUS_READ_IVAR(device_get_parent(dev), dev,			\
 	    ivarp ## _IVAR_ ## ivar, &v);				\
 	return ((type) v);						\
 }									\
 									\
 static __inline void varp ## _set_ ## var(device_t dev, type t)		\
 {									\
 	uintptr_t v = (uintptr_t) t;					\
 	BUS_WRITE_IVAR(device_get_parent(dev), dev,			\
 	    ivarp ## _IVAR_ ## ivar, v);				\
 }
 
 /**
  * Shorthand macros, taking resource argument
  * Generated with sys/tools/bus_macro.sh
  */
 
 #define bus_barrier(r, o, l, f) \
 	bus_space_barrier((r)->r_bustag, (r)->r_bushandle, (o), (l), (f))
 #define bus_read_1(r, o) \
 	bus_space_read_1((r)->r_bustag, (r)->r_bushandle, (o))
 #define bus_read_multi_1(r, o, d, c) \
 	bus_space_read_multi_1((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_region_1(r, o, d, c) \
 	bus_space_read_region_1((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_set_multi_1(r, o, v, c) \
 	bus_space_set_multi_1((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_set_region_1(r, o, v, c) \
 	bus_space_set_region_1((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_write_1(r, o, v) \
 	bus_space_write_1((r)->r_bustag, (r)->r_bushandle, (o), (v))
 #define bus_write_multi_1(r, o, d, c) \
 	bus_space_write_multi_1((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_write_region_1(r, o, d, c) \
 	bus_space_write_region_1((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_stream_1(r, o) \
 	bus_space_read_stream_1((r)->r_bustag, (r)->r_bushandle, (o))
 #define bus_read_multi_stream_1(r, o, d, c) \
 	bus_space_read_multi_stream_1((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_region_stream_1(r, o, d, c) \
 	bus_space_read_region_stream_1((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_set_multi_stream_1(r, o, v, c) \
 	bus_space_set_multi_stream_1((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_set_region_stream_1(r, o, v, c) \
 	bus_space_set_region_stream_1((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_write_stream_1(r, o, v) \
 	bus_space_write_stream_1((r)->r_bustag, (r)->r_bushandle, (o), (v))
 #define bus_write_multi_stream_1(r, o, d, c) \
 	bus_space_write_multi_stream_1((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_write_region_stream_1(r, o, d, c) \
 	bus_space_write_region_stream_1((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_2(r, o) \
 	bus_space_read_2((r)->r_bustag, (r)->r_bushandle, (o))
 #define bus_read_multi_2(r, o, d, c) \
 	bus_space_read_multi_2((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_region_2(r, o, d, c) \
 	bus_space_read_region_2((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_set_multi_2(r, o, v, c) \
 	bus_space_set_multi_2((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_set_region_2(r, o, v, c) \
 	bus_space_set_region_2((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_write_2(r, o, v) \
 	bus_space_write_2((r)->r_bustag, (r)->r_bushandle, (o), (v))
 #define bus_write_multi_2(r, o, d, c) \
 	bus_space_write_multi_2((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_write_region_2(r, o, d, c) \
 	bus_space_write_region_2((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_stream_2(r, o) \
 	bus_space_read_stream_2((r)->r_bustag, (r)->r_bushandle, (o))
 #define bus_read_multi_stream_2(r, o, d, c) \
 	bus_space_read_multi_stream_2((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_region_stream_2(r, o, d, c) \
 	bus_space_read_region_stream_2((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_set_multi_stream_2(r, o, v, c) \
 	bus_space_set_multi_stream_2((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_set_region_stream_2(r, o, v, c) \
 	bus_space_set_region_stream_2((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_write_stream_2(r, o, v) \
 	bus_space_write_stream_2((r)->r_bustag, (r)->r_bushandle, (o), (v))
 #define bus_write_multi_stream_2(r, o, d, c) \
 	bus_space_write_multi_stream_2((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_write_region_stream_2(r, o, d, c) \
 	bus_space_write_region_stream_2((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_4(r, o) \
 	bus_space_read_4((r)->r_bustag, (r)->r_bushandle, (o))
 #define bus_read_multi_4(r, o, d, c) \
 	bus_space_read_multi_4((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_region_4(r, o, d, c) \
 	bus_space_read_region_4((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_set_multi_4(r, o, v, c) \
 	bus_space_set_multi_4((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_set_region_4(r, o, v, c) \
 	bus_space_set_region_4((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_write_4(r, o, v) \
 	bus_space_write_4((r)->r_bustag, (r)->r_bushandle, (o), (v))
 #define bus_write_multi_4(r, o, d, c) \
 	bus_space_write_multi_4((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_write_region_4(r, o, d, c) \
 	bus_space_write_region_4((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_stream_4(r, o) \
 	bus_space_read_stream_4((r)->r_bustag, (r)->r_bushandle, (o))
 #define bus_read_multi_stream_4(r, o, d, c) \
 	bus_space_read_multi_stream_4((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_region_stream_4(r, o, d, c) \
 	bus_space_read_region_stream_4((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_set_multi_stream_4(r, o, v, c) \
 	bus_space_set_multi_stream_4((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_set_region_stream_4(r, o, v, c) \
 	bus_space_set_region_stream_4((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_write_stream_4(r, o, v) \
 	bus_space_write_stream_4((r)->r_bustag, (r)->r_bushandle, (o), (v))
 #define bus_write_multi_stream_4(r, o, d, c) \
 	bus_space_write_multi_stream_4((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_write_region_stream_4(r, o, d, c) \
 	bus_space_write_region_stream_4((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_8(r, o) \
 	bus_space_read_8((r)->r_bustag, (r)->r_bushandle, (o))
 #define bus_read_multi_8(r, o, d, c) \
 	bus_space_read_multi_8((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_region_8(r, o, d, c) \
 	bus_space_read_region_8((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_set_multi_8(r, o, v, c) \
 	bus_space_set_multi_8((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_set_region_8(r, o, v, c) \
 	bus_space_set_region_8((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_write_8(r, o, v) \
 	bus_space_write_8((r)->r_bustag, (r)->r_bushandle, (o), (v))
 #define bus_write_multi_8(r, o, d, c) \
 	bus_space_write_multi_8((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_write_region_8(r, o, d, c) \
 	bus_space_write_region_8((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_stream_8(r, o) \
 	bus_space_read_stream_8((r)->r_bustag, (r)->r_bushandle, (o))
 #define bus_read_multi_stream_8(r, o, d, c) \
 	bus_space_read_multi_stream_8((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_read_region_stream_8(r, o, d, c) \
 	bus_space_read_region_stream_8((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_set_multi_stream_8(r, o, v, c) \
 	bus_space_set_multi_stream_8((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_set_region_stream_8(r, o, v, c) \
 	bus_space_set_region_stream_8((r)->r_bustag, (r)->r_bushandle, (o), (v), (c))
 #define bus_write_stream_8(r, o, v) \
 	bus_space_write_stream_8((r)->r_bustag, (r)->r_bushandle, (o), (v))
 #define bus_write_multi_stream_8(r, o, d, c) \
 	bus_space_write_multi_stream_8((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #define bus_write_region_stream_8(r, o, d, c) \
 	bus_space_write_region_stream_8((r)->r_bustag, (r)->r_bushandle, (o), (d), (c))
 #endif /* _KERNEL */
 
 #endif /* !_SYS_BUS_H_ */
Index: user/alc/PQ_LAUNDRY/sys/sys/lockmgr.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/sys/lockmgr.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/sys/lockmgr.h	(revision 303775)
@@ -1,200 +1,198 @@
 /*-
  * Copyright (c) 2008 Attilio Rao <attilio@FreeBSD.org>
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice(s), this list of conditions and the following disclaimer as
  *    the first lines of this file unmodified other than the possible 
  *    addition of one or more copyright notices.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice(s), this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER(S) ``AS IS'' AND ANY
  * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  * DISCLAIMED.  IN NO EVENT SHALL THE COPYRIGHT HOLDER(S) BE LIABLE FOR ANY
  * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
  * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
  * DAMAGE.
  *
  * $FreeBSD$
  */
 
 #ifndef	_SYS_LOCKMGR_H_
 #define	_SYS_LOCKMGR_H_
 
 #include <sys/_lock.h>
 #include <sys/_lockmgr.h>
 #include <sys/_mutex.h>
 #include <sys/_rwlock.h>
 
 #define	LK_SHARE			0x01
 #define	LK_SHARED_WAITERS		0x02
 #define	LK_EXCLUSIVE_WAITERS		0x04
 #define	LK_EXCLUSIVE_SPINNERS		0x08
 #define	LK_ALL_WAITERS							\
 	(LK_SHARED_WAITERS | LK_EXCLUSIVE_WAITERS)
 #define	LK_FLAGMASK							\
 	(LK_SHARE | LK_ALL_WAITERS | LK_EXCLUSIVE_SPINNERS)
 
 #define	LK_HOLDER(x)			((x) & ~LK_FLAGMASK)
 #define	LK_SHARERS_SHIFT		4
 #define	LK_SHARERS(x)			(LK_HOLDER(x) >> LK_SHARERS_SHIFT)
 #define	LK_SHARERS_LOCK(x)		((x) << LK_SHARERS_SHIFT | LK_SHARE)
 #define	LK_ONE_SHARER			(1 << LK_SHARERS_SHIFT)
 #define	LK_UNLOCKED			LK_SHARERS_LOCK(0)
 #define	LK_KERNPROC			((uintptr_t)(-1) & ~LK_FLAGMASK)
 
 #ifdef _KERNEL
 
 #if !defined(LOCK_FILE) || !defined(LOCK_LINE)
 #error	"LOCK_FILE and LOCK_LINE not defined, include <sys/lock.h> before"
 #endif
 
 struct thread;
 #define	lk_recurse	lock_object.lo_data
 
 /*
  * Function prototipes.  Routines that start with an underscore are not part
  * of the public interface and might be wrappered with a macro.
  */
 int	 __lockmgr_args(struct lock *lk, u_int flags, struct lock_object *ilk,
 	    const char *wmesg, int prio, int timo, const char *file, int line);
 #if defined(INVARIANTS) || defined(INVARIANT_SUPPORT)
 void	 _lockmgr_assert(const struct lock *lk, int what, const char *file, int line);
 #endif
 void	 _lockmgr_disown(struct lock *lk, const char *file, int line);
 
 void	 lockallowrecurse(struct lock *lk);
 void	 lockallowshare(struct lock *lk);
 void	 lockdestroy(struct lock *lk);
 void	 lockdisablerecurse(struct lock *lk);
 void	 lockdisableshare(struct lock *lk);
 void	 lockinit(struct lock *lk, int prio, const char *wmesg, int timo,
 	    int flags);
 #ifdef DDB
 int	 lockmgr_chain(struct thread *td, struct thread **ownerp);
 #endif
 void	 lockmgr_printinfo(const struct lock *lk);
 int	 lockstatus(const struct lock *lk);
 
 /*
  * As far as the ilk can be a static NULL pointer these functions need a
  * strict prototype in order to safely use the lock_object member.
  */
 static __inline int
 _lockmgr_args(struct lock *lk, u_int flags, struct mtx *ilk, const char *wmesg,
     int prio, int timo, const char *file, int line)
 {
 
 	return (__lockmgr_args(lk, flags, (ilk != NULL) ? &ilk->lock_object :
 	    NULL, wmesg, prio, timo, file, line));
 }
 
 static __inline int
 _lockmgr_args_rw(struct lock *lk, u_int flags, struct rwlock *ilk,
     const char *wmesg, int prio, int timo, const char *file, int line)
 {
 
 	return (__lockmgr_args(lk, flags, (ilk != NULL) ? &ilk->lock_object :
 	    NULL, wmesg, prio, timo, file, line));
 }
 
 /*
  * Define aliases in order to complete lockmgr KPI.
  */
 #define	lockmgr(lk, flags, ilk)						\
 	_lockmgr_args((lk), (flags), (ilk), LK_WMESG_DEFAULT,		\
 	    LK_PRIO_DEFAULT, LK_TIMO_DEFAULT, LOCK_FILE, LOCK_LINE)
 #define	lockmgr_args(lk, flags, ilk, wmesg, prio, timo)			\
 	_lockmgr_args((lk), (flags), (ilk), (wmesg), (prio), (timo),	\
 	    LOCK_FILE, LOCK_LINE)
 #define	lockmgr_args_rw(lk, flags, ilk, wmesg, prio, timo)		\
 	_lockmgr_args_rw((lk), (flags), (ilk), (wmesg), (prio), (timo),	\
 	    LOCK_FILE, LOCK_LINE)
 #define	lockmgr_disown(lk)						\
 	_lockmgr_disown((lk), LOCK_FILE, LOCK_LINE)
 #define	lockmgr_recursed(lk)						\
 	((lk)->lk_recurse != 0)
 #define	lockmgr_rw(lk, flags, ilk)					\
 	_lockmgr_args_rw((lk), (flags), (ilk), LK_WMESG_DEFAULT,	\
 	    LK_PRIO_DEFAULT, LK_TIMO_DEFAULT, LOCK_FILE, LOCK_LINE)
-#define	lockmgr_waiters(lk)						\
-	((lk)->lk_lock & LK_ALL_WAITERS)
 #ifdef INVARIANTS
 #define	lockmgr_assert(lk, what)					\
 	_lockmgr_assert((lk), (what), LOCK_FILE, LOCK_LINE)
 #else
 #define	lockmgr_assert(lk, what)
 #endif
 
 /*
  * Flags for lockinit().
  */
 #define	LK_INIT_MASK	0x0000FF
 #define	LK_CANRECURSE	0x000001
 #define	LK_NODUP	0x000002
 #define	LK_NOPROFILE	0x000004
 #define	LK_NOSHARE	0x000008
 #define	LK_NOWITNESS	0x000010
 #define	LK_QUIET	0x000020
 #define	LK_ADAPTIVE	0x000040
 #define	LK_IS_VNODE	0x000080	/* Tell WITNESS about a VNODE lock */
 
 /*
  * Additional attributes to be used in lockmgr().
  */
 #define	LK_EATTR_MASK	0x00FF00
 #define	LK_INTERLOCK	0x000100
 #define	LK_NOWAIT	0x000200
 #define	LK_RETRY	0x000400
 #define	LK_SLEEPFAIL	0x000800
 #define	LK_TIMELOCK	0x001000
 #define	LK_NODDLKTREAT	0x002000
 #define	LK_VNHELD	0x004000
 
 /*
  * Operations for lockmgr().
  */
 #define	LK_TYPE_MASK	0xFF0000
 #define	LK_DOWNGRADE	0x010000
 #define	LK_DRAIN	0x020000
 #define	LK_EXCLOTHER	0x040000
 #define	LK_EXCLUSIVE	0x080000
 #define	LK_RELEASE	0x100000
 #define	LK_SHARED	0x200000
 #define	LK_UPGRADE	0x400000
 #define	LK_TRYUPGRADE	0x800000
 
 #define	LK_TOTAL_MASK	(LK_INIT_MASK | LK_EATTR_MASK | LK_TYPE_MASK)
 
 /*
  * Default values for lockmgr_args().
  */
 #define	LK_WMESG_DEFAULT	(NULL)
 #define	LK_PRIO_DEFAULT		(0)
 #define	LK_TIMO_DEFAULT		(0)
 
 /*
  * Assertion flags.
  */
 #if defined(INVARIANTS) || defined(INVARIANT_SUPPORT)
 #define	KA_LOCKED	LA_LOCKED
 #define	KA_SLOCKED	LA_SLOCKED
 #define	KA_XLOCKED	LA_XLOCKED
 #define	KA_UNLOCKED	LA_UNLOCKED
 #define	KA_RECURSED	LA_RECURSED
 #define	KA_NOTRECURSED	LA_NOTRECURSED
 #endif
 
 #endif /* _KERNEL */
 
 #endif /* !_SYS_LOCKMGR_H_ */
Index: user/alc/PQ_LAUNDRY/sys/sys/syscall.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/sys/syscall.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/sys/syscall.h	(revision 303775)
@@ -1,470 +1,470 @@
 /*
  * System call numbers.
  *
  * DO NOT EDIT-- this file is automatically generated.
  * $FreeBSD$
- * created from FreeBSD: head/sys/kern/syscalls.master 303700 2016-08-03 06:35:58Z ed 
+ * created from FreeBSD: head/sys/kern/syscalls.master 303729 2016-08-03 18:48:56Z bdrewery 
  */
 
 #define	SYS_syscall	0
 #define	SYS_exit	1
 #define	SYS_fork	2
 #define	SYS_read	3
 #define	SYS_write	4
 #define	SYS_open	5
 #define	SYS_close	6
 #define	SYS_wait4	7
 				/* 8 is old creat */
 #define	SYS_link	9
 #define	SYS_unlink	10
 				/* 11 is obsolete execv */
 #define	SYS_chdir	12
 #define	SYS_fchdir	13
 #define	SYS_mknod	14
 #define	SYS_chmod	15
 #define	SYS_chown	16
 #define	SYS_break	17
 				/* 18 is freebsd4 getfsstat */
 				/* 19 is old lseek */
 #define	SYS_getpid	20
 #define	SYS_mount	21
 #define	SYS_unmount	22
 #define	SYS_setuid	23
 #define	SYS_getuid	24
 #define	SYS_geteuid	25
 #define	SYS_ptrace	26
 #define	SYS_recvmsg	27
 #define	SYS_sendmsg	28
 #define	SYS_recvfrom	29
 #define	SYS_accept	30
 #define	SYS_getpeername	31
 #define	SYS_getsockname	32
 #define	SYS_access	33
 #define	SYS_chflags	34
 #define	SYS_fchflags	35
 #define	SYS_sync	36
 #define	SYS_kill	37
 				/* 38 is old stat */
 #define	SYS_getppid	39
 				/* 40 is old lstat */
 #define	SYS_dup	41
-				/* 42 is freebsd10 pipe */
+#define	SYS_freebsd10_pipe	42
 #define	SYS_getegid	43
 #define	SYS_profil	44
 #define	SYS_ktrace	45
 				/* 46 is old sigaction */
 #define	SYS_getgid	47
 				/* 48 is old sigprocmask */
 #define	SYS_getlogin	49
 #define	SYS_setlogin	50
 #define	SYS_acct	51
 				/* 52 is old sigpending */
 #define	SYS_sigaltstack	53
 #define	SYS_ioctl	54
 #define	SYS_reboot	55
 #define	SYS_revoke	56
 #define	SYS_symlink	57
 #define	SYS_readlink	58
 #define	SYS_execve	59
 #define	SYS_umask	60
 #define	SYS_chroot	61
 				/* 62 is old fstat */
 				/* 63 is old getkerninfo */
 				/* 64 is old getpagesize */
 #define	SYS_msync	65
 #define	SYS_vfork	66
 				/* 67 is obsolete vread */
 				/* 68 is obsolete vwrite */
 #define	SYS_sbrk	69
 #define	SYS_sstk	70
 				/* 71 is old mmap */
 #define	SYS_vadvise	72
 #define	SYS_munmap	73
 #define	SYS_mprotect	74
 #define	SYS_madvise	75
 				/* 76 is obsolete vhangup */
 				/* 77 is obsolete vlimit */
 #define	SYS_mincore	78
 #define	SYS_getgroups	79
 #define	SYS_setgroups	80
 #define	SYS_getpgrp	81
 #define	SYS_setpgid	82
 #define	SYS_setitimer	83
 				/* 84 is old wait */
 #define	SYS_swapon	85
 #define	SYS_getitimer	86
 				/* 87 is old gethostname */
 				/* 88 is old sethostname */
 #define	SYS_getdtablesize	89
 #define	SYS_dup2	90
 #define	SYS_fcntl	92
 #define	SYS_select	93
 #define	SYS_fsync	95
 #define	SYS_setpriority	96
 #define	SYS_socket	97
 #define	SYS_connect	98
 				/* 99 is old accept */
 #define	SYS_getpriority	100
 				/* 101 is old send */
 				/* 102 is old recv */
 				/* 103 is old sigreturn */
 #define	SYS_bind	104
 #define	SYS_setsockopt	105
 #define	SYS_listen	106
 				/* 107 is obsolete vtimes */
 				/* 108 is old sigvec */
 				/* 109 is old sigblock */
 				/* 110 is old sigsetmask */
 				/* 111 is old sigsuspend */
 				/* 112 is old sigstack */
 				/* 113 is old recvmsg */
 				/* 114 is old sendmsg */
 				/* 115 is obsolete vtrace */
 #define	SYS_gettimeofday	116
 #define	SYS_getrusage	117
 #define	SYS_getsockopt	118
 #define	SYS_readv	120
 #define	SYS_writev	121
 #define	SYS_settimeofday	122
 #define	SYS_fchown	123
 #define	SYS_fchmod	124
 				/* 125 is old recvfrom */
 #define	SYS_setreuid	126
 #define	SYS_setregid	127
 #define	SYS_rename	128
 				/* 129 is old truncate */
 				/* 130 is old ftruncate */
 #define	SYS_flock	131
 #define	SYS_mkfifo	132
 #define	SYS_sendto	133
 #define	SYS_shutdown	134
 #define	SYS_socketpair	135
 #define	SYS_mkdir	136
 #define	SYS_rmdir	137
 #define	SYS_utimes	138
 				/* 139 is obsolete 4.2 sigreturn */
 #define	SYS_adjtime	140
 				/* 141 is old getpeername */
 				/* 142 is old gethostid */
 				/* 143 is old sethostid */
 				/* 144 is old getrlimit */
 				/* 145 is old setrlimit */
 				/* 146 is old killpg */
 #define	SYS_setsid	147
 #define	SYS_quotactl	148
 				/* 149 is old quota */
 				/* 150 is old getsockname */
 #define	SYS_nlm_syscall	154
 #define	SYS_nfssvc	155
 				/* 156 is old getdirentries */
 				/* 157 is freebsd4 statfs */
 				/* 158 is freebsd4 fstatfs */
 #define	SYS_lgetfh	160
 #define	SYS_getfh	161
 				/* 162 is freebsd4 getdomainname */
 				/* 163 is freebsd4 setdomainname */
 				/* 164 is freebsd4 uname */
 #define	SYS_sysarch	165
 #define	SYS_rtprio	166
 #define	SYS_semsys	169
 #define	SYS_msgsys	170
 #define	SYS_shmsys	171
 				/* 173 is freebsd6 pread */
 				/* 174 is freebsd6 pwrite */
 #define	SYS_setfib	175
 #define	SYS_ntp_adjtime	176
 #define	SYS_setgid	181
 #define	SYS_setegid	182
 #define	SYS_seteuid	183
 #define	SYS_stat	188
 #define	SYS_fstat	189
 #define	SYS_lstat	190
 #define	SYS_pathconf	191
 #define	SYS_fpathconf	192
 #define	SYS_getrlimit	194
 #define	SYS_setrlimit	195
 #define	SYS_getdirentries	196
 				/* 197 is freebsd6 mmap */
 #define	SYS___syscall	198
 				/* 199 is freebsd6 lseek */
 				/* 200 is freebsd6 truncate */
 				/* 201 is freebsd6 ftruncate */
 #define	SYS___sysctl	202
 #define	SYS_mlock	203
 #define	SYS_munlock	204
 #define	SYS_undelete	205
 #define	SYS_futimes	206
 #define	SYS_getpgid	207
 #define	SYS_poll	209
 #define	SYS_freebsd7___semctl	220
 #define	SYS_semget	221
 #define	SYS_semop	222
 #define	SYS_freebsd7_msgctl	224
 #define	SYS_msgget	225
 #define	SYS_msgsnd	226
 #define	SYS_msgrcv	227
 #define	SYS_shmat	228
 #define	SYS_freebsd7_shmctl	229
 #define	SYS_shmdt	230
 #define	SYS_shmget	231
 #define	SYS_clock_gettime	232
 #define	SYS_clock_settime	233
 #define	SYS_clock_getres	234
 #define	SYS_ktimer_create	235
 #define	SYS_ktimer_delete	236
 #define	SYS_ktimer_settime	237
 #define	SYS_ktimer_gettime	238
 #define	SYS_ktimer_getoverrun	239
 #define	SYS_nanosleep	240
 #define	SYS_ffclock_getcounter	241
 #define	SYS_ffclock_setestimate	242
 #define	SYS_ffclock_getestimate	243
 #define	SYS_clock_getcpuclockid2	247
 #define	SYS_ntp_gettime	248
 #define	SYS_minherit	250
 #define	SYS_rfork	251
 #define	SYS_openbsd_poll	252
 #define	SYS_issetugid	253
 #define	SYS_lchown	254
 #define	SYS_aio_read	255
 #define	SYS_aio_write	256
 #define	SYS_lio_listio	257
 #define	SYS_getdents	272
 #define	SYS_lchmod	274
 #define	SYS_netbsd_lchown	275
 #define	SYS_lutimes	276
 #define	SYS_netbsd_msync	277
 #define	SYS_nstat	278
 #define	SYS_nfstat	279
 #define	SYS_nlstat	280
 #define	SYS_preadv	289
 #define	SYS_pwritev	290
 				/* 297 is freebsd4 fhstatfs */
 #define	SYS_fhopen	298
 #define	SYS_fhstat	299
 #define	SYS_modnext	300
 #define	SYS_modstat	301
 #define	SYS_modfnext	302
 #define	SYS_modfind	303
 #define	SYS_kldload	304
 #define	SYS_kldunload	305
 #define	SYS_kldfind	306
 #define	SYS_kldnext	307
 #define	SYS_kldstat	308
 #define	SYS_kldfirstmod	309
 #define	SYS_getsid	310
 #define	SYS_setresuid	311
 #define	SYS_setresgid	312
 				/* 313 is obsolete signanosleep */
 #define	SYS_aio_return	314
 #define	SYS_aio_suspend	315
 #define	SYS_aio_cancel	316
 #define	SYS_aio_error	317
 				/* 318 is freebsd6 aio_read */
 				/* 319 is freebsd6 aio_write */
 				/* 320 is freebsd6 lio_listio */
 #define	SYS_yield	321
 				/* 322 is obsolete thr_sleep */
 				/* 323 is obsolete thr_wakeup */
 #define	SYS_mlockall	324
 #define	SYS_munlockall	325
 #define	SYS___getcwd	326
 #define	SYS_sched_setparam	327
 #define	SYS_sched_getparam	328
 #define	SYS_sched_setscheduler	329
 #define	SYS_sched_getscheduler	330
 #define	SYS_sched_yield	331
 #define	SYS_sched_get_priority_max	332
 #define	SYS_sched_get_priority_min	333
 #define	SYS_sched_rr_get_interval	334
 #define	SYS_utrace	335
 				/* 336 is freebsd4 sendfile */
 #define	SYS_kldsym	337
 #define	SYS_jail	338
 #define	SYS_nnpfs_syscall	339
 #define	SYS_sigprocmask	340
 #define	SYS_sigsuspend	341
 				/* 342 is freebsd4 sigaction */
 #define	SYS_sigpending	343
 				/* 344 is freebsd4 sigreturn */
 #define	SYS_sigtimedwait	345
 #define	SYS_sigwaitinfo	346
 #define	SYS___acl_get_file	347
 #define	SYS___acl_set_file	348
 #define	SYS___acl_get_fd	349
 #define	SYS___acl_set_fd	350
 #define	SYS___acl_delete_file	351
 #define	SYS___acl_delete_fd	352
 #define	SYS___acl_aclcheck_file	353
 #define	SYS___acl_aclcheck_fd	354
 #define	SYS_extattrctl	355
 #define	SYS_extattr_set_file	356
 #define	SYS_extattr_get_file	357
 #define	SYS_extattr_delete_file	358
 #define	SYS_aio_waitcomplete	359
 #define	SYS_getresuid	360
 #define	SYS_getresgid	361
 #define	SYS_kqueue	362
 #define	SYS_kevent	363
 #define	SYS_extattr_set_fd	371
 #define	SYS_extattr_get_fd	372
 #define	SYS_extattr_delete_fd	373
 #define	SYS___setugid	374
 #define	SYS_eaccess	376
 #define	SYS_afs3_syscall	377
 #define	SYS_nmount	378
 #define	SYS___mac_get_proc	384
 #define	SYS___mac_set_proc	385
 #define	SYS___mac_get_fd	386
 #define	SYS___mac_get_file	387
 #define	SYS___mac_set_fd	388
 #define	SYS___mac_set_file	389
 #define	SYS_kenv	390
 #define	SYS_lchflags	391
 #define	SYS_uuidgen	392
 #define	SYS_sendfile	393
 #define	SYS_mac_syscall	394
 #define	SYS_getfsstat	395
 #define	SYS_statfs	396
 #define	SYS_fstatfs	397
 #define	SYS_fhstatfs	398
 #define	SYS_ksem_close	400
 #define	SYS_ksem_post	401
 #define	SYS_ksem_wait	402
 #define	SYS_ksem_trywait	403
 #define	SYS_ksem_init	404
 #define	SYS_ksem_open	405
 #define	SYS_ksem_unlink	406
 #define	SYS_ksem_getvalue	407
 #define	SYS_ksem_destroy	408
 #define	SYS___mac_get_pid	409
 #define	SYS___mac_get_link	410
 #define	SYS___mac_set_link	411
 #define	SYS_extattr_set_link	412
 #define	SYS_extattr_get_link	413
 #define	SYS_extattr_delete_link	414
 #define	SYS___mac_execve	415
 #define	SYS_sigaction	416
 #define	SYS_sigreturn	417
 #define	SYS_getcontext	421
 #define	SYS_setcontext	422
 #define	SYS_swapcontext	423
 #define	SYS_swapoff	424
 #define	SYS___acl_get_link	425
 #define	SYS___acl_set_link	426
 #define	SYS___acl_delete_link	427
 #define	SYS___acl_aclcheck_link	428
 #define	SYS_sigwait	429
 #define	SYS_thr_create	430
 #define	SYS_thr_exit	431
 #define	SYS_thr_self	432
 #define	SYS_thr_kill	433
 #define	SYS_jail_attach	436
 #define	SYS_extattr_list_fd	437
 #define	SYS_extattr_list_file	438
 #define	SYS_extattr_list_link	439
 #define	SYS_ksem_timedwait	441
 #define	SYS_thr_suspend	442
 #define	SYS_thr_wake	443
 #define	SYS_kldunloadf	444
 #define	SYS_audit	445
 #define	SYS_auditon	446
 #define	SYS_getauid	447
 #define	SYS_setauid	448
 #define	SYS_getaudit	449
 #define	SYS_setaudit	450
 #define	SYS_getaudit_addr	451
 #define	SYS_setaudit_addr	452
 #define	SYS_auditctl	453
 #define	SYS__umtx_op	454
 #define	SYS_thr_new	455
 #define	SYS_sigqueue	456
 #define	SYS_kmq_open	457
 #define	SYS_kmq_setattr	458
 #define	SYS_kmq_timedreceive	459
 #define	SYS_kmq_timedsend	460
 #define	SYS_kmq_notify	461
 #define	SYS_kmq_unlink	462
 #define	SYS_abort2	463
 #define	SYS_thr_set_name	464
 #define	SYS_aio_fsync	465
 #define	SYS_rtprio_thread	466
 #define	SYS_sctp_peeloff	471
 #define	SYS_sctp_generic_sendmsg	472
 #define	SYS_sctp_generic_sendmsg_iov	473
 #define	SYS_sctp_generic_recvmsg	474
 #define	SYS_pread	475
 #define	SYS_pwrite	476
 #define	SYS_mmap	477
 #define	SYS_lseek	478
 #define	SYS_truncate	479
 #define	SYS_ftruncate	480
 #define	SYS_thr_kill2	481
 #define	SYS_shm_open	482
 #define	SYS_shm_unlink	483
 #define	SYS_cpuset	484
 #define	SYS_cpuset_setid	485
 #define	SYS_cpuset_getid	486
 #define	SYS_cpuset_getaffinity	487
 #define	SYS_cpuset_setaffinity	488
 #define	SYS_faccessat	489
 #define	SYS_fchmodat	490
 #define	SYS_fchownat	491
 #define	SYS_fexecve	492
 #define	SYS_fstatat	493
 #define	SYS_futimesat	494
 #define	SYS_linkat	495
 #define	SYS_mkdirat	496
 #define	SYS_mkfifoat	497
 #define	SYS_mknodat	498
 #define	SYS_openat	499
 #define	SYS_readlinkat	500
 #define	SYS_renameat	501
 #define	SYS_symlinkat	502
 #define	SYS_unlinkat	503
 #define	SYS_posix_openpt	504
 #define	SYS_gssd_syscall	505
 #define	SYS_jail_get	506
 #define	SYS_jail_set	507
 #define	SYS_jail_remove	508
 #define	SYS_closefrom	509
 #define	SYS___semctl	510
 #define	SYS_msgctl	511
 #define	SYS_shmctl	512
 #define	SYS_lpathconf	513
 				/* 514 is obsolete cap_new */
 #define	SYS___cap_rights_get	515
 #define	SYS_cap_enter	516
 #define	SYS_cap_getmode	517
 #define	SYS_pdfork	518
 #define	SYS_pdkill	519
 #define	SYS_pdgetpid	520
 #define	SYS_pselect	522
 #define	SYS_getloginclass	523
 #define	SYS_setloginclass	524
 #define	SYS_rctl_get_racct	525
 #define	SYS_rctl_get_rules	526
 #define	SYS_rctl_get_limits	527
 #define	SYS_rctl_add_rule	528
 #define	SYS_rctl_remove_rule	529
 #define	SYS_posix_fallocate	530
 #define	SYS_posix_fadvise	531
 #define	SYS_wait6	532
 #define	SYS_cap_rights_limit	533
 #define	SYS_cap_ioctls_limit	534
 #define	SYS_cap_ioctls_get	535
 #define	SYS_cap_fcntls_limit	536
 #define	SYS_cap_fcntls_get	537
 #define	SYS_bindat	538
 #define	SYS_connectat	539
 #define	SYS_chflagsat	540
 #define	SYS_accept4	541
 #define	SYS_pipe2	542
 #define	SYS_aio_mlock	543
 #define	SYS_procctl	544
 #define	SYS_ppoll	545
 #define	SYS_futimens	546
 #define	SYS_utimensat	547
 #define	SYS_numa_getaffinity	548
 #define	SYS_numa_setaffinity	549
 #define	SYS_MAXSYSCALL	550
Index: user/alc/PQ_LAUNDRY/sys/sys/syscall.mk
===================================================================
--- user/alc/PQ_LAUNDRY/sys/sys/syscall.mk	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/sys/syscall.mk	(revision 303775)
@@ -1,397 +1,398 @@
 # FreeBSD system call object files.
 # DO NOT EDIT-- this file is automatically generated.
 # $FreeBSD$
-# created from FreeBSD: head/sys/kern/syscalls.master 303700 2016-08-03 06:35:58Z ed 
+# created from FreeBSD: head/sys/kern/syscalls.master 303729 2016-08-03 18:48:56Z bdrewery 
 MIASM =  \
 	syscall.o \
 	exit.o \
 	fork.o \
 	read.o \
 	write.o \
 	open.o \
 	close.o \
 	wait4.o \
 	link.o \
 	unlink.o \
 	chdir.o \
 	fchdir.o \
 	mknod.o \
 	chmod.o \
 	chown.o \
 	break.o \
 	getpid.o \
 	mount.o \
 	unmount.o \
 	setuid.o \
 	getuid.o \
 	geteuid.o \
 	ptrace.o \
 	recvmsg.o \
 	sendmsg.o \
 	recvfrom.o \
 	accept.o \
 	getpeername.o \
 	getsockname.o \
 	access.o \
 	chflags.o \
 	fchflags.o \
 	sync.o \
 	kill.o \
 	getppid.o \
 	dup.o \
+	freebsd10_pipe.o \
 	getegid.o \
 	profil.o \
 	ktrace.o \
 	getgid.o \
 	getlogin.o \
 	setlogin.o \
 	acct.o \
 	sigaltstack.o \
 	ioctl.o \
 	reboot.o \
 	revoke.o \
 	symlink.o \
 	readlink.o \
 	execve.o \
 	umask.o \
 	chroot.o \
 	msync.o \
 	vfork.o \
 	sbrk.o \
 	sstk.o \
 	vadvise.o \
 	munmap.o \
 	mprotect.o \
 	madvise.o \
 	mincore.o \
 	getgroups.o \
 	setgroups.o \
 	getpgrp.o \
 	setpgid.o \
 	setitimer.o \
 	swapon.o \
 	getitimer.o \
 	getdtablesize.o \
 	dup2.o \
 	fcntl.o \
 	select.o \
 	fsync.o \
 	setpriority.o \
 	socket.o \
 	connect.o \
 	getpriority.o \
 	bind.o \
 	setsockopt.o \
 	listen.o \
 	gettimeofday.o \
 	getrusage.o \
 	getsockopt.o \
 	readv.o \
 	writev.o \
 	settimeofday.o \
 	fchown.o \
 	fchmod.o \
 	setreuid.o \
 	setregid.o \
 	rename.o \
 	flock.o \
 	mkfifo.o \
 	sendto.o \
 	shutdown.o \
 	socketpair.o \
 	mkdir.o \
 	rmdir.o \
 	utimes.o \
 	adjtime.o \
 	setsid.o \
 	quotactl.o \
 	nlm_syscall.o \
 	nfssvc.o \
 	lgetfh.o \
 	getfh.o \
 	sysarch.o \
 	rtprio.o \
 	semsys.o \
 	msgsys.o \
 	shmsys.o \
 	setfib.o \
 	ntp_adjtime.o \
 	setgid.o \
 	setegid.o \
 	seteuid.o \
 	stat.o \
 	fstat.o \
 	lstat.o \
 	pathconf.o \
 	fpathconf.o \
 	getrlimit.o \
 	setrlimit.o \
 	getdirentries.o \
 	__syscall.o \
 	__sysctl.o \
 	mlock.o \
 	munlock.o \
 	undelete.o \
 	futimes.o \
 	getpgid.o \
 	poll.o \
 	freebsd7___semctl.o \
 	semget.o \
 	semop.o \
 	freebsd7_msgctl.o \
 	msgget.o \
 	msgsnd.o \
 	msgrcv.o \
 	shmat.o \
 	freebsd7_shmctl.o \
 	shmdt.o \
 	shmget.o \
 	clock_gettime.o \
 	clock_settime.o \
 	clock_getres.o \
 	ktimer_create.o \
 	ktimer_delete.o \
 	ktimer_settime.o \
 	ktimer_gettime.o \
 	ktimer_getoverrun.o \
 	nanosleep.o \
 	ffclock_getcounter.o \
 	ffclock_setestimate.o \
 	ffclock_getestimate.o \
 	clock_getcpuclockid2.o \
 	ntp_gettime.o \
 	minherit.o \
 	rfork.o \
 	openbsd_poll.o \
 	issetugid.o \
 	lchown.o \
 	aio_read.o \
 	aio_write.o \
 	lio_listio.o \
 	getdents.o \
 	lchmod.o \
 	netbsd_lchown.o \
 	lutimes.o \
 	netbsd_msync.o \
 	nstat.o \
 	nfstat.o \
 	nlstat.o \
 	preadv.o \
 	pwritev.o \
 	fhopen.o \
 	fhstat.o \
 	modnext.o \
 	modstat.o \
 	modfnext.o \
 	modfind.o \
 	kldload.o \
 	kldunload.o \
 	kldfind.o \
 	kldnext.o \
 	kldstat.o \
 	kldfirstmod.o \
 	getsid.o \
 	setresuid.o \
 	setresgid.o \
 	aio_return.o \
 	aio_suspend.o \
 	aio_cancel.o \
 	aio_error.o \
 	yield.o \
 	mlockall.o \
 	munlockall.o \
 	__getcwd.o \
 	sched_setparam.o \
 	sched_getparam.o \
 	sched_setscheduler.o \
 	sched_getscheduler.o \
 	sched_yield.o \
 	sched_get_priority_max.o \
 	sched_get_priority_min.o \
 	sched_rr_get_interval.o \
 	utrace.o \
 	kldsym.o \
 	jail.o \
 	nnpfs_syscall.o \
 	sigprocmask.o \
 	sigsuspend.o \
 	sigpending.o \
 	sigtimedwait.o \
 	sigwaitinfo.o \
 	__acl_get_file.o \
 	__acl_set_file.o \
 	__acl_get_fd.o \
 	__acl_set_fd.o \
 	__acl_delete_file.o \
 	__acl_delete_fd.o \
 	__acl_aclcheck_file.o \
 	__acl_aclcheck_fd.o \
 	extattrctl.o \
 	extattr_set_file.o \
 	extattr_get_file.o \
 	extattr_delete_file.o \
 	aio_waitcomplete.o \
 	getresuid.o \
 	getresgid.o \
 	kqueue.o \
 	kevent.o \
 	extattr_set_fd.o \
 	extattr_get_fd.o \
 	extattr_delete_fd.o \
 	__setugid.o \
 	eaccess.o \
 	afs3_syscall.o \
 	nmount.o \
 	__mac_get_proc.o \
 	__mac_set_proc.o \
 	__mac_get_fd.o \
 	__mac_get_file.o \
 	__mac_set_fd.o \
 	__mac_set_file.o \
 	kenv.o \
 	lchflags.o \
 	uuidgen.o \
 	sendfile.o \
 	mac_syscall.o \
 	getfsstat.o \
 	statfs.o \
 	fstatfs.o \
 	fhstatfs.o \
 	ksem_close.o \
 	ksem_post.o \
 	ksem_wait.o \
 	ksem_trywait.o \
 	ksem_init.o \
 	ksem_open.o \
 	ksem_unlink.o \
 	ksem_getvalue.o \
 	ksem_destroy.o \
 	__mac_get_pid.o \
 	__mac_get_link.o \
 	__mac_set_link.o \
 	extattr_set_link.o \
 	extattr_get_link.o \
 	extattr_delete_link.o \
 	__mac_execve.o \
 	sigaction.o \
 	sigreturn.o \
 	getcontext.o \
 	setcontext.o \
 	swapcontext.o \
 	swapoff.o \
 	__acl_get_link.o \
 	__acl_set_link.o \
 	__acl_delete_link.o \
 	__acl_aclcheck_link.o \
 	sigwait.o \
 	thr_create.o \
 	thr_exit.o \
 	thr_self.o \
 	thr_kill.o \
 	jail_attach.o \
 	extattr_list_fd.o \
 	extattr_list_file.o \
 	extattr_list_link.o \
 	ksem_timedwait.o \
 	thr_suspend.o \
 	thr_wake.o \
 	kldunloadf.o \
 	audit.o \
 	auditon.o \
 	getauid.o \
 	setauid.o \
 	getaudit.o \
 	setaudit.o \
 	getaudit_addr.o \
 	setaudit_addr.o \
 	auditctl.o \
 	_umtx_op.o \
 	thr_new.o \
 	sigqueue.o \
 	kmq_open.o \
 	kmq_setattr.o \
 	kmq_timedreceive.o \
 	kmq_timedsend.o \
 	kmq_notify.o \
 	kmq_unlink.o \
 	abort2.o \
 	thr_set_name.o \
 	aio_fsync.o \
 	rtprio_thread.o \
 	sctp_peeloff.o \
 	sctp_generic_sendmsg.o \
 	sctp_generic_sendmsg_iov.o \
 	sctp_generic_recvmsg.o \
 	pread.o \
 	pwrite.o \
 	mmap.o \
 	lseek.o \
 	truncate.o \
 	ftruncate.o \
 	thr_kill2.o \
 	shm_open.o \
 	shm_unlink.o \
 	cpuset.o \
 	cpuset_setid.o \
 	cpuset_getid.o \
 	cpuset_getaffinity.o \
 	cpuset_setaffinity.o \
 	faccessat.o \
 	fchmodat.o \
 	fchownat.o \
 	fexecve.o \
 	fstatat.o \
 	futimesat.o \
 	linkat.o \
 	mkdirat.o \
 	mkfifoat.o \
 	mknodat.o \
 	openat.o \
 	readlinkat.o \
 	renameat.o \
 	symlinkat.o \
 	unlinkat.o \
 	posix_openpt.o \
 	gssd_syscall.o \
 	jail_get.o \
 	jail_set.o \
 	jail_remove.o \
 	closefrom.o \
 	__semctl.o \
 	msgctl.o \
 	shmctl.o \
 	lpathconf.o \
 	__cap_rights_get.o \
 	cap_enter.o \
 	cap_getmode.o \
 	pdfork.o \
 	pdkill.o \
 	pdgetpid.o \
 	pselect.o \
 	getloginclass.o \
 	setloginclass.o \
 	rctl_get_racct.o \
 	rctl_get_rules.o \
 	rctl_get_limits.o \
 	rctl_add_rule.o \
 	rctl_remove_rule.o \
 	posix_fallocate.o \
 	posix_fadvise.o \
 	wait6.o \
 	cap_rights_limit.o \
 	cap_ioctls_limit.o \
 	cap_ioctls_get.o \
 	cap_fcntls_limit.o \
 	cap_fcntls_get.o \
 	bindat.o \
 	connectat.o \
 	chflagsat.o \
 	accept4.o \
 	pipe2.o \
 	aio_mlock.o \
 	procctl.o \
 	ppoll.o \
 	futimens.o \
 	utimensat.o \
 	numa_getaffinity.o \
 	numa_setaffinity.o
Index: user/alc/PQ_LAUNDRY/sys/sys/sysproto.h
===================================================================
--- user/alc/PQ_LAUNDRY/sys/sys/sysproto.h	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/sys/sysproto.h	(revision 303775)
@@ -1,2959 +1,2959 @@
 /*
  * System call prototypes.
  *
  * DO NOT EDIT-- this file is automatically generated.
  * $FreeBSD$
- * created from FreeBSD: head/sys/kern/syscalls.master 303700 2016-08-03 06:35:58Z ed 
+ * created from FreeBSD: head/sys/kern/syscalls.master 303729 2016-08-03 18:48:56Z bdrewery 
  */
 
 #ifndef _SYS_SYSPROTO_H_
 #define	_SYS_SYSPROTO_H_
 
 #include <sys/signal.h>
 #include <sys/acl.h>
 #include <sys/cpuset.h>
 #include <sys/_ffcounter.h>
 #include <sys/_semaphore.h>
 #include <sys/ucontext.h>
 #include <sys/wait.h>
 
 #include <bsm/audit_kevents.h>
 
 struct proc;
 
 struct thread;
 
 #define	PAD_(t)	(sizeof(register_t) <= sizeof(t) ? \
 		0 : sizeof(register_t) - sizeof(t))
 
 #if BYTE_ORDER == LITTLE_ENDIAN
 #define	PADL_(t)	0
 #define	PADR_(t)	PAD_(t)
 #else
 #define	PADL_(t)	PAD_(t)
 #define	PADR_(t)	0
 #endif
 
 struct nosys_args {
 	register_t dummy;
 };
 struct sys_exit_args {
 	char rval_l_[PADL_(int)]; int rval; char rval_r_[PADR_(int)];
 };
 struct fork_args {
 	register_t dummy;
 };
 struct read_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char buf_l_[PADL_(void *)]; void * buf; char buf_r_[PADR_(void *)];
 	char nbyte_l_[PADL_(size_t)]; size_t nbyte; char nbyte_r_[PADR_(size_t)];
 };
 struct write_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char buf_l_[PADL_(const void *)]; const void * buf; char buf_r_[PADR_(const void *)];
 	char nbyte_l_[PADL_(size_t)]; size_t nbyte; char nbyte_r_[PADR_(size_t)];
 };
 struct open_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 	char mode_l_[PADL_(int)]; int mode; char mode_r_[PADR_(int)];
 };
 struct close_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 };
 struct wait4_args {
 	char pid_l_[PADL_(int)]; int pid; char pid_r_[PADR_(int)];
 	char status_l_[PADL_(int *)]; int * status; char status_r_[PADR_(int *)];
 	char options_l_[PADL_(int)]; int options; char options_r_[PADR_(int)];
 	char rusage_l_[PADL_(struct rusage *)]; struct rusage * rusage; char rusage_r_[PADR_(struct rusage *)];
 };
 struct link_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char link_l_[PADL_(char *)]; char * link; char link_r_[PADR_(char *)];
 };
 struct unlink_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 };
 struct chdir_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 };
 struct fchdir_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 };
 struct mknod_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char mode_l_[PADL_(int)]; int mode; char mode_r_[PADR_(int)];
 	char dev_l_[PADL_(int)]; int dev; char dev_r_[PADR_(int)];
 };
 struct chmod_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char mode_l_[PADL_(int)]; int mode; char mode_r_[PADR_(int)];
 };
 struct chown_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char uid_l_[PADL_(int)]; int uid; char uid_r_[PADR_(int)];
 	char gid_l_[PADL_(int)]; int gid; char gid_r_[PADR_(int)];
 };
 struct obreak_args {
 	char nsize_l_[PADL_(char *)]; char * nsize; char nsize_r_[PADR_(char *)];
 };
 struct getpid_args {
 	register_t dummy;
 };
 struct mount_args {
 	char type_l_[PADL_(char *)]; char * type; char type_r_[PADR_(char *)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 	char data_l_[PADL_(caddr_t)]; caddr_t data; char data_r_[PADR_(caddr_t)];
 };
 struct unmount_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct setuid_args {
 	char uid_l_[PADL_(uid_t)]; uid_t uid; char uid_r_[PADR_(uid_t)];
 };
 struct getuid_args {
 	register_t dummy;
 };
 struct geteuid_args {
 	register_t dummy;
 };
 struct ptrace_args {
 	char req_l_[PADL_(int)]; int req; char req_r_[PADR_(int)];
 	char pid_l_[PADL_(pid_t)]; pid_t pid; char pid_r_[PADR_(pid_t)];
 	char addr_l_[PADL_(caddr_t)]; caddr_t addr; char addr_r_[PADR_(caddr_t)];
 	char data_l_[PADL_(int)]; int data; char data_r_[PADR_(int)];
 };
 struct recvmsg_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char msg_l_[PADL_(struct msghdr *)]; struct msghdr * msg; char msg_r_[PADR_(struct msghdr *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct sendmsg_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char msg_l_[PADL_(struct msghdr *)]; struct msghdr * msg; char msg_r_[PADR_(struct msghdr *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct recvfrom_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char buf_l_[PADL_(caddr_t)]; caddr_t buf; char buf_r_[PADR_(caddr_t)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 	char from_l_[PADL_(struct sockaddr *__restrict)]; struct sockaddr *__restrict from; char from_r_[PADR_(struct sockaddr *__restrict)];
 	char fromlenaddr_l_[PADL_(__socklen_t *__restrict)]; __socklen_t *__restrict fromlenaddr; char fromlenaddr_r_[PADR_(__socklen_t *__restrict)];
 };
 struct accept_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char name_l_[PADL_(struct sockaddr *__restrict)]; struct sockaddr *__restrict name; char name_r_[PADR_(struct sockaddr *__restrict)];
 	char anamelen_l_[PADL_(__socklen_t *__restrict)]; __socklen_t *__restrict anamelen; char anamelen_r_[PADR_(__socklen_t *__restrict)];
 };
 struct getpeername_args {
 	char fdes_l_[PADL_(int)]; int fdes; char fdes_r_[PADR_(int)];
 	char asa_l_[PADL_(struct sockaddr *__restrict)]; struct sockaddr *__restrict asa; char asa_r_[PADR_(struct sockaddr *__restrict)];
 	char alen_l_[PADL_(__socklen_t *__restrict)]; __socklen_t *__restrict alen; char alen_r_[PADR_(__socklen_t *__restrict)];
 };
 struct getsockname_args {
 	char fdes_l_[PADL_(int)]; int fdes; char fdes_r_[PADR_(int)];
 	char asa_l_[PADL_(struct sockaddr *__restrict)]; struct sockaddr *__restrict asa; char asa_r_[PADR_(struct sockaddr *__restrict)];
 	char alen_l_[PADL_(__socklen_t *__restrict)]; __socklen_t *__restrict alen; char alen_r_[PADR_(__socklen_t *__restrict)];
 };
 struct access_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char amode_l_[PADL_(int)]; int amode; char amode_r_[PADR_(int)];
 };
 struct chflags_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char flags_l_[PADL_(u_long)]; u_long flags; char flags_r_[PADR_(u_long)];
 };
 struct fchflags_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char flags_l_[PADL_(u_long)]; u_long flags; char flags_r_[PADR_(u_long)];
 };
 struct sync_args {
 	register_t dummy;
 };
 struct kill_args {
 	char pid_l_[PADL_(int)]; int pid; char pid_r_[PADR_(int)];
 	char signum_l_[PADL_(int)]; int signum; char signum_r_[PADR_(int)];
 };
 struct getppid_args {
 	register_t dummy;
 };
 struct dup_args {
 	char fd_l_[PADL_(u_int)]; u_int fd; char fd_r_[PADR_(u_int)];
 };
 struct freebsd10_pipe_args {
 	register_t dummy;
 };
 struct getegid_args {
 	register_t dummy;
 };
 struct profil_args {
 	char samples_l_[PADL_(caddr_t)]; caddr_t samples; char samples_r_[PADR_(caddr_t)];
 	char size_l_[PADL_(size_t)]; size_t size; char size_r_[PADR_(size_t)];
 	char offset_l_[PADL_(size_t)]; size_t offset; char offset_r_[PADR_(size_t)];
 	char scale_l_[PADL_(u_int)]; u_int scale; char scale_r_[PADR_(u_int)];
 };
 struct ktrace_args {
 	char fname_l_[PADL_(const char *)]; const char * fname; char fname_r_[PADR_(const char *)];
 	char ops_l_[PADL_(int)]; int ops; char ops_r_[PADR_(int)];
 	char facs_l_[PADL_(int)]; int facs; char facs_r_[PADR_(int)];
 	char pid_l_[PADL_(int)]; int pid; char pid_r_[PADR_(int)];
 };
 struct getgid_args {
 	register_t dummy;
 };
 struct getlogin_args {
 	char namebuf_l_[PADL_(char *)]; char * namebuf; char namebuf_r_[PADR_(char *)];
 	char namelen_l_[PADL_(u_int)]; u_int namelen; char namelen_r_[PADR_(u_int)];
 };
 struct setlogin_args {
 	char namebuf_l_[PADL_(char *)]; char * namebuf; char namebuf_r_[PADR_(char *)];
 };
 struct acct_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 };
 struct osigpending_args {
 	register_t dummy;
 };
 struct sigaltstack_args {
 	char ss_l_[PADL_(stack_t *)]; stack_t * ss; char ss_r_[PADR_(stack_t *)];
 	char oss_l_[PADL_(stack_t *)]; stack_t * oss; char oss_r_[PADR_(stack_t *)];
 };
 struct ioctl_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char com_l_[PADL_(u_long)]; u_long com; char com_r_[PADR_(u_long)];
 	char data_l_[PADL_(caddr_t)]; caddr_t data; char data_r_[PADR_(caddr_t)];
 };
 struct reboot_args {
 	char opt_l_[PADL_(int)]; int opt; char opt_r_[PADR_(int)];
 };
 struct revoke_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 };
 struct symlink_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char link_l_[PADL_(char *)]; char * link; char link_r_[PADR_(char *)];
 };
 struct readlink_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char buf_l_[PADL_(char *)]; char * buf; char buf_r_[PADR_(char *)];
 	char count_l_[PADL_(size_t)]; size_t count; char count_r_[PADR_(size_t)];
 };
 struct execve_args {
 	char fname_l_[PADL_(char *)]; char * fname; char fname_r_[PADR_(char *)];
 	char argv_l_[PADL_(char **)]; char ** argv; char argv_r_[PADR_(char **)];
 	char envv_l_[PADL_(char **)]; char ** envv; char envv_r_[PADR_(char **)];
 };
 struct umask_args {
 	char newmask_l_[PADL_(int)]; int newmask; char newmask_r_[PADR_(int)];
 };
 struct chroot_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 };
 struct getpagesize_args {
 	register_t dummy;
 };
 struct msync_args {
 	char addr_l_[PADL_(void *)]; void * addr; char addr_r_[PADR_(void *)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct vfork_args {
 	register_t dummy;
 };
 struct sbrk_args {
 	char incr_l_[PADL_(int)]; int incr; char incr_r_[PADR_(int)];
 };
 struct sstk_args {
 	char incr_l_[PADL_(int)]; int incr; char incr_r_[PADR_(int)];
 };
 struct ovadvise_args {
 	char anom_l_[PADL_(int)]; int anom; char anom_r_[PADR_(int)];
 };
 struct munmap_args {
 	char addr_l_[PADL_(void *)]; void * addr; char addr_r_[PADR_(void *)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 };
 struct mprotect_args {
 	char addr_l_[PADL_(void *)]; void * addr; char addr_r_[PADR_(void *)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 	char prot_l_[PADL_(int)]; int prot; char prot_r_[PADR_(int)];
 };
 struct madvise_args {
 	char addr_l_[PADL_(void *)]; void * addr; char addr_r_[PADR_(void *)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 	char behav_l_[PADL_(int)]; int behav; char behav_r_[PADR_(int)];
 };
 struct mincore_args {
 	char addr_l_[PADL_(const void *)]; const void * addr; char addr_r_[PADR_(const void *)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 	char vec_l_[PADL_(char *)]; char * vec; char vec_r_[PADR_(char *)];
 };
 struct getgroups_args {
 	char gidsetsize_l_[PADL_(u_int)]; u_int gidsetsize; char gidsetsize_r_[PADR_(u_int)];
 	char gidset_l_[PADL_(gid_t *)]; gid_t * gidset; char gidset_r_[PADR_(gid_t *)];
 };
 struct setgroups_args {
 	char gidsetsize_l_[PADL_(u_int)]; u_int gidsetsize; char gidsetsize_r_[PADR_(u_int)];
 	char gidset_l_[PADL_(gid_t *)]; gid_t * gidset; char gidset_r_[PADR_(gid_t *)];
 };
 struct getpgrp_args {
 	register_t dummy;
 };
 struct setpgid_args {
 	char pid_l_[PADL_(int)]; int pid; char pid_r_[PADR_(int)];
 	char pgid_l_[PADL_(int)]; int pgid; char pgid_r_[PADR_(int)];
 };
 struct setitimer_args {
 	char which_l_[PADL_(u_int)]; u_int which; char which_r_[PADR_(u_int)];
 	char itv_l_[PADL_(struct itimerval *)]; struct itimerval * itv; char itv_r_[PADR_(struct itimerval *)];
 	char oitv_l_[PADL_(struct itimerval *)]; struct itimerval * oitv; char oitv_r_[PADR_(struct itimerval *)];
 };
 struct owait_args {
 	register_t dummy;
 };
 struct swapon_args {
 	char name_l_[PADL_(char *)]; char * name; char name_r_[PADR_(char *)];
 };
 struct getitimer_args {
 	char which_l_[PADL_(u_int)]; u_int which; char which_r_[PADR_(u_int)];
 	char itv_l_[PADL_(struct itimerval *)]; struct itimerval * itv; char itv_r_[PADR_(struct itimerval *)];
 };
 struct getdtablesize_args {
 	register_t dummy;
 };
 struct dup2_args {
 	char from_l_[PADL_(u_int)]; u_int from; char from_r_[PADR_(u_int)];
 	char to_l_[PADL_(u_int)]; u_int to; char to_r_[PADR_(u_int)];
 };
 struct fcntl_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char cmd_l_[PADL_(int)]; int cmd; char cmd_r_[PADR_(int)];
 	char arg_l_[PADL_(long)]; long arg; char arg_r_[PADR_(long)];
 };
 struct select_args {
 	char nd_l_[PADL_(int)]; int nd; char nd_r_[PADR_(int)];
 	char in_l_[PADL_(fd_set *)]; fd_set * in; char in_r_[PADR_(fd_set *)];
 	char ou_l_[PADL_(fd_set *)]; fd_set * ou; char ou_r_[PADR_(fd_set *)];
 	char ex_l_[PADL_(fd_set *)]; fd_set * ex; char ex_r_[PADR_(fd_set *)];
 	char tv_l_[PADL_(struct timeval *)]; struct timeval * tv; char tv_r_[PADR_(struct timeval *)];
 };
 struct fsync_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 };
 struct setpriority_args {
 	char which_l_[PADL_(int)]; int which; char which_r_[PADR_(int)];
 	char who_l_[PADL_(int)]; int who; char who_r_[PADR_(int)];
 	char prio_l_[PADL_(int)]; int prio; char prio_r_[PADR_(int)];
 };
 struct socket_args {
 	char domain_l_[PADL_(int)]; int domain; char domain_r_[PADR_(int)];
 	char type_l_[PADL_(int)]; int type; char type_r_[PADR_(int)];
 	char protocol_l_[PADL_(int)]; int protocol; char protocol_r_[PADR_(int)];
 };
 struct connect_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char name_l_[PADL_(caddr_t)]; caddr_t name; char name_r_[PADR_(caddr_t)];
 	char namelen_l_[PADL_(int)]; int namelen; char namelen_r_[PADR_(int)];
 };
 struct getpriority_args {
 	char which_l_[PADL_(int)]; int which; char which_r_[PADR_(int)];
 	char who_l_[PADL_(int)]; int who; char who_r_[PADR_(int)];
 };
 struct bind_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char name_l_[PADL_(caddr_t)]; caddr_t name; char name_r_[PADR_(caddr_t)];
 	char namelen_l_[PADL_(int)]; int namelen; char namelen_r_[PADR_(int)];
 };
 struct setsockopt_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char level_l_[PADL_(int)]; int level; char level_r_[PADR_(int)];
 	char name_l_[PADL_(int)]; int name; char name_r_[PADR_(int)];
 	char val_l_[PADL_(caddr_t)]; caddr_t val; char val_r_[PADR_(caddr_t)];
 	char valsize_l_[PADL_(int)]; int valsize; char valsize_r_[PADR_(int)];
 };
 struct listen_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char backlog_l_[PADL_(int)]; int backlog; char backlog_r_[PADR_(int)];
 };
 struct gettimeofday_args {
 	char tp_l_[PADL_(struct timeval *)]; struct timeval * tp; char tp_r_[PADR_(struct timeval *)];
 	char tzp_l_[PADL_(struct timezone *)]; struct timezone * tzp; char tzp_r_[PADR_(struct timezone *)];
 };
 struct getrusage_args {
 	char who_l_[PADL_(int)]; int who; char who_r_[PADR_(int)];
 	char rusage_l_[PADL_(struct rusage *)]; struct rusage * rusage; char rusage_r_[PADR_(struct rusage *)];
 };
 struct getsockopt_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char level_l_[PADL_(int)]; int level; char level_r_[PADR_(int)];
 	char name_l_[PADL_(int)]; int name; char name_r_[PADR_(int)];
 	char val_l_[PADL_(caddr_t)]; caddr_t val; char val_r_[PADR_(caddr_t)];
 	char avalsize_l_[PADL_(int *)]; int * avalsize; char avalsize_r_[PADR_(int *)];
 };
 struct readv_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char iovp_l_[PADL_(struct iovec *)]; struct iovec * iovp; char iovp_r_[PADR_(struct iovec *)];
 	char iovcnt_l_[PADL_(u_int)]; u_int iovcnt; char iovcnt_r_[PADR_(u_int)];
 };
 struct writev_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char iovp_l_[PADL_(struct iovec *)]; struct iovec * iovp; char iovp_r_[PADR_(struct iovec *)];
 	char iovcnt_l_[PADL_(u_int)]; u_int iovcnt; char iovcnt_r_[PADR_(u_int)];
 };
 struct settimeofday_args {
 	char tv_l_[PADL_(struct timeval *)]; struct timeval * tv; char tv_r_[PADR_(struct timeval *)];
 	char tzp_l_[PADL_(struct timezone *)]; struct timezone * tzp; char tzp_r_[PADR_(struct timezone *)];
 };
 struct fchown_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char uid_l_[PADL_(int)]; int uid; char uid_r_[PADR_(int)];
 	char gid_l_[PADL_(int)]; int gid; char gid_r_[PADR_(int)];
 };
 struct fchmod_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char mode_l_[PADL_(int)]; int mode; char mode_r_[PADR_(int)];
 };
 struct setreuid_args {
 	char ruid_l_[PADL_(int)]; int ruid; char ruid_r_[PADR_(int)];
 	char euid_l_[PADL_(int)]; int euid; char euid_r_[PADR_(int)];
 };
 struct setregid_args {
 	char rgid_l_[PADL_(int)]; int rgid; char rgid_r_[PADR_(int)];
 	char egid_l_[PADL_(int)]; int egid; char egid_r_[PADR_(int)];
 };
 struct rename_args {
 	char from_l_[PADL_(char *)]; char * from; char from_r_[PADR_(char *)];
 	char to_l_[PADL_(char *)]; char * to; char to_r_[PADR_(char *)];
 };
 struct flock_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char how_l_[PADL_(int)]; int how; char how_r_[PADR_(int)];
 };
 struct mkfifo_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char mode_l_[PADL_(int)]; int mode; char mode_r_[PADR_(int)];
 };
 struct sendto_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char buf_l_[PADL_(caddr_t)]; caddr_t buf; char buf_r_[PADR_(caddr_t)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 	char to_l_[PADL_(caddr_t)]; caddr_t to; char to_r_[PADR_(caddr_t)];
 	char tolen_l_[PADL_(int)]; int tolen; char tolen_r_[PADR_(int)];
 };
 struct shutdown_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char how_l_[PADL_(int)]; int how; char how_r_[PADR_(int)];
 };
 struct socketpair_args {
 	char domain_l_[PADL_(int)]; int domain; char domain_r_[PADR_(int)];
 	char type_l_[PADL_(int)]; int type; char type_r_[PADR_(int)];
 	char protocol_l_[PADL_(int)]; int protocol; char protocol_r_[PADR_(int)];
 	char rsv_l_[PADL_(int *)]; int * rsv; char rsv_r_[PADR_(int *)];
 };
 struct mkdir_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char mode_l_[PADL_(int)]; int mode; char mode_r_[PADR_(int)];
 };
 struct rmdir_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 };
 struct utimes_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char tptr_l_[PADL_(struct timeval *)]; struct timeval * tptr; char tptr_r_[PADR_(struct timeval *)];
 };
 struct adjtime_args {
 	char delta_l_[PADL_(struct timeval *)]; struct timeval * delta; char delta_r_[PADR_(struct timeval *)];
 	char olddelta_l_[PADL_(struct timeval *)]; struct timeval * olddelta; char olddelta_r_[PADR_(struct timeval *)];
 };
 struct ogethostid_args {
 	register_t dummy;
 };
 struct setsid_args {
 	register_t dummy;
 };
 struct quotactl_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char cmd_l_[PADL_(int)]; int cmd; char cmd_r_[PADR_(int)];
 	char uid_l_[PADL_(int)]; int uid; char uid_r_[PADR_(int)];
 	char arg_l_[PADL_(caddr_t)]; caddr_t arg; char arg_r_[PADR_(caddr_t)];
 };
 struct oquota_args {
 	register_t dummy;
 };
 struct nlm_syscall_args {
 	char debug_level_l_[PADL_(int)]; int debug_level; char debug_level_r_[PADR_(int)];
 	char grace_period_l_[PADL_(int)]; int grace_period; char grace_period_r_[PADR_(int)];
 	char addr_count_l_[PADL_(int)]; int addr_count; char addr_count_r_[PADR_(int)];
 	char addrs_l_[PADL_(char **)]; char ** addrs; char addrs_r_[PADR_(char **)];
 };
 struct nfssvc_args {
 	char flag_l_[PADL_(int)]; int flag; char flag_r_[PADR_(int)];
 	char argp_l_[PADL_(caddr_t)]; caddr_t argp; char argp_r_[PADR_(caddr_t)];
 };
 struct lgetfh_args {
 	char fname_l_[PADL_(char *)]; char * fname; char fname_r_[PADR_(char *)];
 	char fhp_l_[PADL_(struct fhandle *)]; struct fhandle * fhp; char fhp_r_[PADR_(struct fhandle *)];
 };
 struct getfh_args {
 	char fname_l_[PADL_(char *)]; char * fname; char fname_r_[PADR_(char *)];
 	char fhp_l_[PADL_(struct fhandle *)]; struct fhandle * fhp; char fhp_r_[PADR_(struct fhandle *)];
 };
 struct sysarch_args {
 	char op_l_[PADL_(int)]; int op; char op_r_[PADR_(int)];
 	char parms_l_[PADL_(char *)]; char * parms; char parms_r_[PADR_(char *)];
 };
 struct rtprio_args {
 	char function_l_[PADL_(int)]; int function; char function_r_[PADR_(int)];
 	char pid_l_[PADL_(pid_t)]; pid_t pid; char pid_r_[PADR_(pid_t)];
 	char rtp_l_[PADL_(struct rtprio *)]; struct rtprio * rtp; char rtp_r_[PADR_(struct rtprio *)];
 };
 struct semsys_args {
 	char which_l_[PADL_(int)]; int which; char which_r_[PADR_(int)];
 	char a2_l_[PADL_(int)]; int a2; char a2_r_[PADR_(int)];
 	char a3_l_[PADL_(int)]; int a3; char a3_r_[PADR_(int)];
 	char a4_l_[PADL_(int)]; int a4; char a4_r_[PADR_(int)];
 	char a5_l_[PADL_(int)]; int a5; char a5_r_[PADR_(int)];
 };
 struct msgsys_args {
 	char which_l_[PADL_(int)]; int which; char which_r_[PADR_(int)];
 	char a2_l_[PADL_(int)]; int a2; char a2_r_[PADR_(int)];
 	char a3_l_[PADL_(int)]; int a3; char a3_r_[PADR_(int)];
 	char a4_l_[PADL_(int)]; int a4; char a4_r_[PADR_(int)];
 	char a5_l_[PADL_(int)]; int a5; char a5_r_[PADR_(int)];
 	char a6_l_[PADL_(int)]; int a6; char a6_r_[PADR_(int)];
 };
 struct shmsys_args {
 	char which_l_[PADL_(int)]; int which; char which_r_[PADR_(int)];
 	char a2_l_[PADL_(int)]; int a2; char a2_r_[PADR_(int)];
 	char a3_l_[PADL_(int)]; int a3; char a3_r_[PADR_(int)];
 	char a4_l_[PADL_(int)]; int a4; char a4_r_[PADR_(int)];
 };
 struct setfib_args {
 	char fibnum_l_[PADL_(int)]; int fibnum; char fibnum_r_[PADR_(int)];
 };
 struct ntp_adjtime_args {
 	char tp_l_[PADL_(struct timex *)]; struct timex * tp; char tp_r_[PADR_(struct timex *)];
 };
 struct setgid_args {
 	char gid_l_[PADL_(gid_t)]; gid_t gid; char gid_r_[PADR_(gid_t)];
 };
 struct setegid_args {
 	char egid_l_[PADL_(gid_t)]; gid_t egid; char egid_r_[PADR_(gid_t)];
 };
 struct seteuid_args {
 	char euid_l_[PADL_(uid_t)]; uid_t euid; char euid_r_[PADR_(uid_t)];
 };
 struct stat_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char ub_l_[PADL_(struct stat *)]; struct stat * ub; char ub_r_[PADR_(struct stat *)];
 };
 struct fstat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char sb_l_[PADL_(struct stat *)]; struct stat * sb; char sb_r_[PADR_(struct stat *)];
 };
 struct lstat_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char ub_l_[PADL_(struct stat *)]; struct stat * ub; char ub_r_[PADR_(struct stat *)];
 };
 struct pathconf_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char name_l_[PADL_(int)]; int name; char name_r_[PADR_(int)];
 };
 struct fpathconf_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char name_l_[PADL_(int)]; int name; char name_r_[PADR_(int)];
 };
 struct __getrlimit_args {
 	char which_l_[PADL_(u_int)]; u_int which; char which_r_[PADR_(u_int)];
 	char rlp_l_[PADL_(struct rlimit *)]; struct rlimit * rlp; char rlp_r_[PADR_(struct rlimit *)];
 };
 struct __setrlimit_args {
 	char which_l_[PADL_(u_int)]; u_int which; char which_r_[PADR_(u_int)];
 	char rlp_l_[PADL_(struct rlimit *)]; struct rlimit * rlp; char rlp_r_[PADR_(struct rlimit *)];
 };
 struct getdirentries_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char buf_l_[PADL_(char *)]; char * buf; char buf_r_[PADR_(char *)];
 	char count_l_[PADL_(u_int)]; u_int count; char count_r_[PADR_(u_int)];
 	char basep_l_[PADL_(long *)]; long * basep; char basep_r_[PADR_(long *)];
 };
 struct sysctl_args {
 	char name_l_[PADL_(int *)]; int * name; char name_r_[PADR_(int *)];
 	char namelen_l_[PADL_(u_int)]; u_int namelen; char namelen_r_[PADR_(u_int)];
 	char old_l_[PADL_(void *)]; void * old; char old_r_[PADR_(void *)];
 	char oldlenp_l_[PADL_(size_t *)]; size_t * oldlenp; char oldlenp_r_[PADR_(size_t *)];
 	char new_l_[PADL_(void *)]; void * new; char new_r_[PADR_(void *)];
 	char newlen_l_[PADL_(size_t)]; size_t newlen; char newlen_r_[PADR_(size_t)];
 };
 struct mlock_args {
 	char addr_l_[PADL_(const void *)]; const void * addr; char addr_r_[PADR_(const void *)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 };
 struct munlock_args {
 	char addr_l_[PADL_(const void *)]; const void * addr; char addr_r_[PADR_(const void *)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 };
 struct undelete_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 };
 struct futimes_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char tptr_l_[PADL_(struct timeval *)]; struct timeval * tptr; char tptr_r_[PADR_(struct timeval *)];
 };
 struct getpgid_args {
 	char pid_l_[PADL_(pid_t)]; pid_t pid; char pid_r_[PADR_(pid_t)];
 };
 struct poll_args {
 	char fds_l_[PADL_(struct pollfd *)]; struct pollfd * fds; char fds_r_[PADR_(struct pollfd *)];
 	char nfds_l_[PADL_(u_int)]; u_int nfds; char nfds_r_[PADR_(u_int)];
 	char timeout_l_[PADL_(int)]; int timeout; char timeout_r_[PADR_(int)];
 };
 struct semget_args {
 	char key_l_[PADL_(key_t)]; key_t key; char key_r_[PADR_(key_t)];
 	char nsems_l_[PADL_(int)]; int nsems; char nsems_r_[PADR_(int)];
 	char semflg_l_[PADL_(int)]; int semflg; char semflg_r_[PADR_(int)];
 };
 struct semop_args {
 	char semid_l_[PADL_(int)]; int semid; char semid_r_[PADR_(int)];
 	char sops_l_[PADL_(struct sembuf *)]; struct sembuf * sops; char sops_r_[PADR_(struct sembuf *)];
 	char nsops_l_[PADL_(size_t)]; size_t nsops; char nsops_r_[PADR_(size_t)];
 };
 struct msgget_args {
 	char key_l_[PADL_(key_t)]; key_t key; char key_r_[PADR_(key_t)];
 	char msgflg_l_[PADL_(int)]; int msgflg; char msgflg_r_[PADR_(int)];
 };
 struct msgsnd_args {
 	char msqid_l_[PADL_(int)]; int msqid; char msqid_r_[PADR_(int)];
 	char msgp_l_[PADL_(const void *)]; const void * msgp; char msgp_r_[PADR_(const void *)];
 	char msgsz_l_[PADL_(size_t)]; size_t msgsz; char msgsz_r_[PADR_(size_t)];
 	char msgflg_l_[PADL_(int)]; int msgflg; char msgflg_r_[PADR_(int)];
 };
 struct msgrcv_args {
 	char msqid_l_[PADL_(int)]; int msqid; char msqid_r_[PADR_(int)];
 	char msgp_l_[PADL_(void *)]; void * msgp; char msgp_r_[PADR_(void *)];
 	char msgsz_l_[PADL_(size_t)]; size_t msgsz; char msgsz_r_[PADR_(size_t)];
 	char msgtyp_l_[PADL_(long)]; long msgtyp; char msgtyp_r_[PADR_(long)];
 	char msgflg_l_[PADL_(int)]; int msgflg; char msgflg_r_[PADR_(int)];
 };
 struct shmat_args {
 	char shmid_l_[PADL_(int)]; int shmid; char shmid_r_[PADR_(int)];
 	char shmaddr_l_[PADL_(const void *)]; const void * shmaddr; char shmaddr_r_[PADR_(const void *)];
 	char shmflg_l_[PADL_(int)]; int shmflg; char shmflg_r_[PADR_(int)];
 };
 struct shmdt_args {
 	char shmaddr_l_[PADL_(const void *)]; const void * shmaddr; char shmaddr_r_[PADR_(const void *)];
 };
 struct shmget_args {
 	char key_l_[PADL_(key_t)]; key_t key; char key_r_[PADR_(key_t)];
 	char size_l_[PADL_(size_t)]; size_t size; char size_r_[PADR_(size_t)];
 	char shmflg_l_[PADL_(int)]; int shmflg; char shmflg_r_[PADR_(int)];
 };
 struct clock_gettime_args {
 	char clock_id_l_[PADL_(clockid_t)]; clockid_t clock_id; char clock_id_r_[PADR_(clockid_t)];
 	char tp_l_[PADL_(struct timespec *)]; struct timespec * tp; char tp_r_[PADR_(struct timespec *)];
 };
 struct clock_settime_args {
 	char clock_id_l_[PADL_(clockid_t)]; clockid_t clock_id; char clock_id_r_[PADR_(clockid_t)];
 	char tp_l_[PADL_(const struct timespec *)]; const struct timespec * tp; char tp_r_[PADR_(const struct timespec *)];
 };
 struct clock_getres_args {
 	char clock_id_l_[PADL_(clockid_t)]; clockid_t clock_id; char clock_id_r_[PADR_(clockid_t)];
 	char tp_l_[PADL_(struct timespec *)]; struct timespec * tp; char tp_r_[PADR_(struct timespec *)];
 };
 struct ktimer_create_args {
 	char clock_id_l_[PADL_(clockid_t)]; clockid_t clock_id; char clock_id_r_[PADR_(clockid_t)];
 	char evp_l_[PADL_(struct sigevent *)]; struct sigevent * evp; char evp_r_[PADR_(struct sigevent *)];
 	char timerid_l_[PADL_(int *)]; int * timerid; char timerid_r_[PADR_(int *)];
 };
 struct ktimer_delete_args {
 	char timerid_l_[PADL_(int)]; int timerid; char timerid_r_[PADR_(int)];
 };
 struct ktimer_settime_args {
 	char timerid_l_[PADL_(int)]; int timerid; char timerid_r_[PADR_(int)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 	char value_l_[PADL_(const struct itimerspec *)]; const struct itimerspec * value; char value_r_[PADR_(const struct itimerspec *)];
 	char ovalue_l_[PADL_(struct itimerspec *)]; struct itimerspec * ovalue; char ovalue_r_[PADR_(struct itimerspec *)];
 };
 struct ktimer_gettime_args {
 	char timerid_l_[PADL_(int)]; int timerid; char timerid_r_[PADR_(int)];
 	char value_l_[PADL_(struct itimerspec *)]; struct itimerspec * value; char value_r_[PADR_(struct itimerspec *)];
 };
 struct ktimer_getoverrun_args {
 	char timerid_l_[PADL_(int)]; int timerid; char timerid_r_[PADR_(int)];
 };
 struct nanosleep_args {
 	char rqtp_l_[PADL_(const struct timespec *)]; const struct timespec * rqtp; char rqtp_r_[PADR_(const struct timespec *)];
 	char rmtp_l_[PADL_(struct timespec *)]; struct timespec * rmtp; char rmtp_r_[PADR_(struct timespec *)];
 };
 struct ffclock_getcounter_args {
 	char ffcount_l_[PADL_(ffcounter *)]; ffcounter * ffcount; char ffcount_r_[PADR_(ffcounter *)];
 };
 struct ffclock_setestimate_args {
 	char cest_l_[PADL_(struct ffclock_estimate *)]; struct ffclock_estimate * cest; char cest_r_[PADR_(struct ffclock_estimate *)];
 };
 struct ffclock_getestimate_args {
 	char cest_l_[PADL_(struct ffclock_estimate *)]; struct ffclock_estimate * cest; char cest_r_[PADR_(struct ffclock_estimate *)];
 };
 struct clock_getcpuclockid2_args {
 	char id_l_[PADL_(id_t)]; id_t id; char id_r_[PADR_(id_t)];
 	char which_l_[PADL_(int)]; int which; char which_r_[PADR_(int)];
 	char clock_id_l_[PADL_(clockid_t *)]; clockid_t * clock_id; char clock_id_r_[PADR_(clockid_t *)];
 };
 struct ntp_gettime_args {
 	char ntvp_l_[PADL_(struct ntptimeval *)]; struct ntptimeval * ntvp; char ntvp_r_[PADR_(struct ntptimeval *)];
 };
 struct minherit_args {
 	char addr_l_[PADL_(void *)]; void * addr; char addr_r_[PADR_(void *)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 	char inherit_l_[PADL_(int)]; int inherit; char inherit_r_[PADR_(int)];
 };
 struct rfork_args {
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct openbsd_poll_args {
 	char fds_l_[PADL_(struct pollfd *)]; struct pollfd * fds; char fds_r_[PADR_(struct pollfd *)];
 	char nfds_l_[PADL_(u_int)]; u_int nfds; char nfds_r_[PADR_(u_int)];
 	char timeout_l_[PADL_(int)]; int timeout; char timeout_r_[PADR_(int)];
 };
 struct issetugid_args {
 	register_t dummy;
 };
 struct lchown_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char uid_l_[PADL_(int)]; int uid; char uid_r_[PADR_(int)];
 	char gid_l_[PADL_(int)]; int gid; char gid_r_[PADR_(int)];
 };
 struct aio_read_args {
 	char aiocbp_l_[PADL_(struct aiocb *)]; struct aiocb * aiocbp; char aiocbp_r_[PADR_(struct aiocb *)];
 };
 struct aio_write_args {
 	char aiocbp_l_[PADL_(struct aiocb *)]; struct aiocb * aiocbp; char aiocbp_r_[PADR_(struct aiocb *)];
 };
 struct lio_listio_args {
 	char mode_l_[PADL_(int)]; int mode; char mode_r_[PADR_(int)];
 	char acb_list_l_[PADL_(struct aiocb *const *)]; struct aiocb *const * acb_list; char acb_list_r_[PADR_(struct aiocb *const *)];
 	char nent_l_[PADL_(int)]; int nent; char nent_r_[PADR_(int)];
 	char sig_l_[PADL_(struct sigevent *)]; struct sigevent * sig; char sig_r_[PADR_(struct sigevent *)];
 };
 struct getdents_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char buf_l_[PADL_(char *)]; char * buf; char buf_r_[PADR_(char *)];
 	char count_l_[PADL_(size_t)]; size_t count; char count_r_[PADR_(size_t)];
 };
 struct lchmod_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char mode_l_[PADL_(mode_t)]; mode_t mode; char mode_r_[PADR_(mode_t)];
 };
 struct lutimes_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char tptr_l_[PADL_(struct timeval *)]; struct timeval * tptr; char tptr_r_[PADR_(struct timeval *)];
 };
 struct nstat_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char ub_l_[PADL_(struct nstat *)]; struct nstat * ub; char ub_r_[PADR_(struct nstat *)];
 };
 struct nfstat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char sb_l_[PADL_(struct nstat *)]; struct nstat * sb; char sb_r_[PADR_(struct nstat *)];
 };
 struct nlstat_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char ub_l_[PADL_(struct nstat *)]; struct nstat * ub; char ub_r_[PADR_(struct nstat *)];
 };
 struct preadv_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char iovp_l_[PADL_(struct iovec *)]; struct iovec * iovp; char iovp_r_[PADR_(struct iovec *)];
 	char iovcnt_l_[PADL_(u_int)]; u_int iovcnt; char iovcnt_r_[PADR_(u_int)];
 	char offset_l_[PADL_(off_t)]; off_t offset; char offset_r_[PADR_(off_t)];
 };
 struct pwritev_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char iovp_l_[PADL_(struct iovec *)]; struct iovec * iovp; char iovp_r_[PADR_(struct iovec *)];
 	char iovcnt_l_[PADL_(u_int)]; u_int iovcnt; char iovcnt_r_[PADR_(u_int)];
 	char offset_l_[PADL_(off_t)]; off_t offset; char offset_r_[PADR_(off_t)];
 };
 struct fhopen_args {
 	char u_fhp_l_[PADL_(const struct fhandle *)]; const struct fhandle * u_fhp; char u_fhp_r_[PADR_(const struct fhandle *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct fhstat_args {
 	char u_fhp_l_[PADL_(const struct fhandle *)]; const struct fhandle * u_fhp; char u_fhp_r_[PADR_(const struct fhandle *)];
 	char sb_l_[PADL_(struct stat *)]; struct stat * sb; char sb_r_[PADR_(struct stat *)];
 };
 struct modnext_args {
 	char modid_l_[PADL_(int)]; int modid; char modid_r_[PADR_(int)];
 };
 struct modstat_args {
 	char modid_l_[PADL_(int)]; int modid; char modid_r_[PADR_(int)];
 	char stat_l_[PADL_(struct module_stat *)]; struct module_stat * stat; char stat_r_[PADR_(struct module_stat *)];
 };
 struct modfnext_args {
 	char modid_l_[PADL_(int)]; int modid; char modid_r_[PADR_(int)];
 };
 struct modfind_args {
 	char name_l_[PADL_(const char *)]; const char * name; char name_r_[PADR_(const char *)];
 };
 struct kldload_args {
 	char file_l_[PADL_(const char *)]; const char * file; char file_r_[PADR_(const char *)];
 };
 struct kldunload_args {
 	char fileid_l_[PADL_(int)]; int fileid; char fileid_r_[PADR_(int)];
 };
 struct kldfind_args {
 	char file_l_[PADL_(const char *)]; const char * file; char file_r_[PADR_(const char *)];
 };
 struct kldnext_args {
 	char fileid_l_[PADL_(int)]; int fileid; char fileid_r_[PADR_(int)];
 };
 struct kldstat_args {
 	char fileid_l_[PADL_(int)]; int fileid; char fileid_r_[PADR_(int)];
 	char stat_l_[PADL_(struct kld_file_stat *)]; struct kld_file_stat * stat; char stat_r_[PADR_(struct kld_file_stat *)];
 };
 struct kldfirstmod_args {
 	char fileid_l_[PADL_(int)]; int fileid; char fileid_r_[PADR_(int)];
 };
 struct getsid_args {
 	char pid_l_[PADL_(pid_t)]; pid_t pid; char pid_r_[PADR_(pid_t)];
 };
 struct setresuid_args {
 	char ruid_l_[PADL_(uid_t)]; uid_t ruid; char ruid_r_[PADR_(uid_t)];
 	char euid_l_[PADL_(uid_t)]; uid_t euid; char euid_r_[PADR_(uid_t)];
 	char suid_l_[PADL_(uid_t)]; uid_t suid; char suid_r_[PADR_(uid_t)];
 };
 struct setresgid_args {
 	char rgid_l_[PADL_(gid_t)]; gid_t rgid; char rgid_r_[PADR_(gid_t)];
 	char egid_l_[PADL_(gid_t)]; gid_t egid; char egid_r_[PADR_(gid_t)];
 	char sgid_l_[PADL_(gid_t)]; gid_t sgid; char sgid_r_[PADR_(gid_t)];
 };
 struct aio_return_args {
 	char aiocbp_l_[PADL_(struct aiocb *)]; struct aiocb * aiocbp; char aiocbp_r_[PADR_(struct aiocb *)];
 };
 struct aio_suspend_args {
 	char aiocbp_l_[PADL_(struct aiocb *const *)]; struct aiocb *const * aiocbp; char aiocbp_r_[PADR_(struct aiocb *const *)];
 	char nent_l_[PADL_(int)]; int nent; char nent_r_[PADR_(int)];
 	char timeout_l_[PADL_(const struct timespec *)]; const struct timespec * timeout; char timeout_r_[PADR_(const struct timespec *)];
 };
 struct aio_cancel_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char aiocbp_l_[PADL_(struct aiocb *)]; struct aiocb * aiocbp; char aiocbp_r_[PADR_(struct aiocb *)];
 };
 struct aio_error_args {
 	char aiocbp_l_[PADL_(struct aiocb *)]; struct aiocb * aiocbp; char aiocbp_r_[PADR_(struct aiocb *)];
 };
 struct yield_args {
 	register_t dummy;
 };
 struct mlockall_args {
 	char how_l_[PADL_(int)]; int how; char how_r_[PADR_(int)];
 };
 struct munlockall_args {
 	register_t dummy;
 };
 struct __getcwd_args {
 	char buf_l_[PADL_(char *)]; char * buf; char buf_r_[PADR_(char *)];
 	char buflen_l_[PADL_(u_int)]; u_int buflen; char buflen_r_[PADR_(u_int)];
 };
 struct sched_setparam_args {
 	char pid_l_[PADL_(pid_t)]; pid_t pid; char pid_r_[PADR_(pid_t)];
 	char param_l_[PADL_(const struct sched_param *)]; const struct sched_param * param; char param_r_[PADR_(const struct sched_param *)];
 };
 struct sched_getparam_args {
 	char pid_l_[PADL_(pid_t)]; pid_t pid; char pid_r_[PADR_(pid_t)];
 	char param_l_[PADL_(struct sched_param *)]; struct sched_param * param; char param_r_[PADR_(struct sched_param *)];
 };
 struct sched_setscheduler_args {
 	char pid_l_[PADL_(pid_t)]; pid_t pid; char pid_r_[PADR_(pid_t)];
 	char policy_l_[PADL_(int)]; int policy; char policy_r_[PADR_(int)];
 	char param_l_[PADL_(const struct sched_param *)]; const struct sched_param * param; char param_r_[PADR_(const struct sched_param *)];
 };
 struct sched_getscheduler_args {
 	char pid_l_[PADL_(pid_t)]; pid_t pid; char pid_r_[PADR_(pid_t)];
 };
 struct sched_yield_args {
 	register_t dummy;
 };
 struct sched_get_priority_max_args {
 	char policy_l_[PADL_(int)]; int policy; char policy_r_[PADR_(int)];
 };
 struct sched_get_priority_min_args {
 	char policy_l_[PADL_(int)]; int policy; char policy_r_[PADR_(int)];
 };
 struct sched_rr_get_interval_args {
 	char pid_l_[PADL_(pid_t)]; pid_t pid; char pid_r_[PADR_(pid_t)];
 	char interval_l_[PADL_(struct timespec *)]; struct timespec * interval; char interval_r_[PADR_(struct timespec *)];
 };
 struct utrace_args {
 	char addr_l_[PADL_(const void *)]; const void * addr; char addr_r_[PADR_(const void *)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 };
 struct kldsym_args {
 	char fileid_l_[PADL_(int)]; int fileid; char fileid_r_[PADR_(int)];
 	char cmd_l_[PADL_(int)]; int cmd; char cmd_r_[PADR_(int)];
 	char data_l_[PADL_(void *)]; void * data; char data_r_[PADR_(void *)];
 };
 struct jail_args {
 	char jail_l_[PADL_(struct jail *)]; struct jail * jail; char jail_r_[PADR_(struct jail *)];
 };
 struct nnpfs_syscall_args {
 	char operation_l_[PADL_(int)]; int operation; char operation_r_[PADR_(int)];
 	char a_pathP_l_[PADL_(char *)]; char * a_pathP; char a_pathP_r_[PADR_(char *)];
 	char a_opcode_l_[PADL_(int)]; int a_opcode; char a_opcode_r_[PADR_(int)];
 	char a_paramsP_l_[PADL_(void *)]; void * a_paramsP; char a_paramsP_r_[PADR_(void *)];
 	char a_followSymlinks_l_[PADL_(int)]; int a_followSymlinks; char a_followSymlinks_r_[PADR_(int)];
 };
 struct sigprocmask_args {
 	char how_l_[PADL_(int)]; int how; char how_r_[PADR_(int)];
 	char set_l_[PADL_(const sigset_t *)]; const sigset_t * set; char set_r_[PADR_(const sigset_t *)];
 	char oset_l_[PADL_(sigset_t *)]; sigset_t * oset; char oset_r_[PADR_(sigset_t *)];
 };
 struct sigsuspend_args {
 	char sigmask_l_[PADL_(const sigset_t *)]; const sigset_t * sigmask; char sigmask_r_[PADR_(const sigset_t *)];
 };
 struct sigpending_args {
 	char set_l_[PADL_(sigset_t *)]; sigset_t * set; char set_r_[PADR_(sigset_t *)];
 };
 struct sigtimedwait_args {
 	char set_l_[PADL_(const sigset_t *)]; const sigset_t * set; char set_r_[PADR_(const sigset_t *)];
 	char info_l_[PADL_(siginfo_t *)]; siginfo_t * info; char info_r_[PADR_(siginfo_t *)];
 	char timeout_l_[PADL_(const struct timespec *)]; const struct timespec * timeout; char timeout_r_[PADR_(const struct timespec *)];
 };
 struct sigwaitinfo_args {
 	char set_l_[PADL_(const sigset_t *)]; const sigset_t * set; char set_r_[PADR_(const sigset_t *)];
 	char info_l_[PADL_(siginfo_t *)]; siginfo_t * info; char info_r_[PADR_(siginfo_t *)];
 };
 struct __acl_get_file_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char type_l_[PADL_(acl_type_t)]; acl_type_t type; char type_r_[PADR_(acl_type_t)];
 	char aclp_l_[PADL_(struct acl *)]; struct acl * aclp; char aclp_r_[PADR_(struct acl *)];
 };
 struct __acl_set_file_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char type_l_[PADL_(acl_type_t)]; acl_type_t type; char type_r_[PADR_(acl_type_t)];
 	char aclp_l_[PADL_(struct acl *)]; struct acl * aclp; char aclp_r_[PADR_(struct acl *)];
 };
 struct __acl_get_fd_args {
 	char filedes_l_[PADL_(int)]; int filedes; char filedes_r_[PADR_(int)];
 	char type_l_[PADL_(acl_type_t)]; acl_type_t type; char type_r_[PADR_(acl_type_t)];
 	char aclp_l_[PADL_(struct acl *)]; struct acl * aclp; char aclp_r_[PADR_(struct acl *)];
 };
 struct __acl_set_fd_args {
 	char filedes_l_[PADL_(int)]; int filedes; char filedes_r_[PADR_(int)];
 	char type_l_[PADL_(acl_type_t)]; acl_type_t type; char type_r_[PADR_(acl_type_t)];
 	char aclp_l_[PADL_(struct acl *)]; struct acl * aclp; char aclp_r_[PADR_(struct acl *)];
 };
 struct __acl_delete_file_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char type_l_[PADL_(acl_type_t)]; acl_type_t type; char type_r_[PADR_(acl_type_t)];
 };
 struct __acl_delete_fd_args {
 	char filedes_l_[PADL_(int)]; int filedes; char filedes_r_[PADR_(int)];
 	char type_l_[PADL_(acl_type_t)]; acl_type_t type; char type_r_[PADR_(acl_type_t)];
 };
 struct __acl_aclcheck_file_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char type_l_[PADL_(acl_type_t)]; acl_type_t type; char type_r_[PADR_(acl_type_t)];
 	char aclp_l_[PADL_(struct acl *)]; struct acl * aclp; char aclp_r_[PADR_(struct acl *)];
 };
 struct __acl_aclcheck_fd_args {
 	char filedes_l_[PADL_(int)]; int filedes; char filedes_r_[PADR_(int)];
 	char type_l_[PADL_(acl_type_t)]; acl_type_t type; char type_r_[PADR_(acl_type_t)];
 	char aclp_l_[PADL_(struct acl *)]; struct acl * aclp; char aclp_r_[PADR_(struct acl *)];
 };
 struct extattrctl_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char cmd_l_[PADL_(int)]; int cmd; char cmd_r_[PADR_(int)];
 	char filename_l_[PADL_(const char *)]; const char * filename; char filename_r_[PADR_(const char *)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char attrname_l_[PADL_(const char *)]; const char * attrname; char attrname_r_[PADR_(const char *)];
 };
 struct extattr_set_file_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char attrname_l_[PADL_(const char *)]; const char * attrname; char attrname_r_[PADR_(const char *)];
 	char data_l_[PADL_(void *)]; void * data; char data_r_[PADR_(void *)];
 	char nbytes_l_[PADL_(size_t)]; size_t nbytes; char nbytes_r_[PADR_(size_t)];
 };
 struct extattr_get_file_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char attrname_l_[PADL_(const char *)]; const char * attrname; char attrname_r_[PADR_(const char *)];
 	char data_l_[PADL_(void *)]; void * data; char data_r_[PADR_(void *)];
 	char nbytes_l_[PADL_(size_t)]; size_t nbytes; char nbytes_r_[PADR_(size_t)];
 };
 struct extattr_delete_file_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char attrname_l_[PADL_(const char *)]; const char * attrname; char attrname_r_[PADR_(const char *)];
 };
 struct aio_waitcomplete_args {
 	char aiocbp_l_[PADL_(struct aiocb **)]; struct aiocb ** aiocbp; char aiocbp_r_[PADR_(struct aiocb **)];
 	char timeout_l_[PADL_(struct timespec *)]; struct timespec * timeout; char timeout_r_[PADR_(struct timespec *)];
 };
 struct getresuid_args {
 	char ruid_l_[PADL_(uid_t *)]; uid_t * ruid; char ruid_r_[PADR_(uid_t *)];
 	char euid_l_[PADL_(uid_t *)]; uid_t * euid; char euid_r_[PADR_(uid_t *)];
 	char suid_l_[PADL_(uid_t *)]; uid_t * suid; char suid_r_[PADR_(uid_t *)];
 };
 struct getresgid_args {
 	char rgid_l_[PADL_(gid_t *)]; gid_t * rgid; char rgid_r_[PADR_(gid_t *)];
 	char egid_l_[PADL_(gid_t *)]; gid_t * egid; char egid_r_[PADR_(gid_t *)];
 	char sgid_l_[PADL_(gid_t *)]; gid_t * sgid; char sgid_r_[PADR_(gid_t *)];
 };
 struct kqueue_args {
 	register_t dummy;
 };
 struct kevent_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char changelist_l_[PADL_(struct kevent *)]; struct kevent * changelist; char changelist_r_[PADR_(struct kevent *)];
 	char nchanges_l_[PADL_(int)]; int nchanges; char nchanges_r_[PADR_(int)];
 	char eventlist_l_[PADL_(struct kevent *)]; struct kevent * eventlist; char eventlist_r_[PADR_(struct kevent *)];
 	char nevents_l_[PADL_(int)]; int nevents; char nevents_r_[PADR_(int)];
 	char timeout_l_[PADL_(const struct timespec *)]; const struct timespec * timeout; char timeout_r_[PADR_(const struct timespec *)];
 };
 struct extattr_set_fd_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char attrname_l_[PADL_(const char *)]; const char * attrname; char attrname_r_[PADR_(const char *)];
 	char data_l_[PADL_(void *)]; void * data; char data_r_[PADR_(void *)];
 	char nbytes_l_[PADL_(size_t)]; size_t nbytes; char nbytes_r_[PADR_(size_t)];
 };
 struct extattr_get_fd_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char attrname_l_[PADL_(const char *)]; const char * attrname; char attrname_r_[PADR_(const char *)];
 	char data_l_[PADL_(void *)]; void * data; char data_r_[PADR_(void *)];
 	char nbytes_l_[PADL_(size_t)]; size_t nbytes; char nbytes_r_[PADR_(size_t)];
 };
 struct extattr_delete_fd_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char attrname_l_[PADL_(const char *)]; const char * attrname; char attrname_r_[PADR_(const char *)];
 };
 struct __setugid_args {
 	char flag_l_[PADL_(int)]; int flag; char flag_r_[PADR_(int)];
 };
 struct eaccess_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char amode_l_[PADL_(int)]; int amode; char amode_r_[PADR_(int)];
 };
 struct afs3_syscall_args {
 	char syscall_l_[PADL_(long)]; long syscall; char syscall_r_[PADR_(long)];
 	char parm1_l_[PADL_(long)]; long parm1; char parm1_r_[PADR_(long)];
 	char parm2_l_[PADL_(long)]; long parm2; char parm2_r_[PADR_(long)];
 	char parm3_l_[PADL_(long)]; long parm3; char parm3_r_[PADR_(long)];
 	char parm4_l_[PADL_(long)]; long parm4; char parm4_r_[PADR_(long)];
 	char parm5_l_[PADL_(long)]; long parm5; char parm5_r_[PADR_(long)];
 	char parm6_l_[PADL_(long)]; long parm6; char parm6_r_[PADR_(long)];
 };
 struct nmount_args {
 	char iovp_l_[PADL_(struct iovec *)]; struct iovec * iovp; char iovp_r_[PADR_(struct iovec *)];
 	char iovcnt_l_[PADL_(unsigned int)]; unsigned int iovcnt; char iovcnt_r_[PADR_(unsigned int)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct __mac_get_proc_args {
 	char mac_p_l_[PADL_(struct mac *)]; struct mac * mac_p; char mac_p_r_[PADR_(struct mac *)];
 };
 struct __mac_set_proc_args {
 	char mac_p_l_[PADL_(struct mac *)]; struct mac * mac_p; char mac_p_r_[PADR_(struct mac *)];
 };
 struct __mac_get_fd_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char mac_p_l_[PADL_(struct mac *)]; struct mac * mac_p; char mac_p_r_[PADR_(struct mac *)];
 };
 struct __mac_get_file_args {
 	char path_p_l_[PADL_(const char *)]; const char * path_p; char path_p_r_[PADR_(const char *)];
 	char mac_p_l_[PADL_(struct mac *)]; struct mac * mac_p; char mac_p_r_[PADR_(struct mac *)];
 };
 struct __mac_set_fd_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char mac_p_l_[PADL_(struct mac *)]; struct mac * mac_p; char mac_p_r_[PADR_(struct mac *)];
 };
 struct __mac_set_file_args {
 	char path_p_l_[PADL_(const char *)]; const char * path_p; char path_p_r_[PADR_(const char *)];
 	char mac_p_l_[PADL_(struct mac *)]; struct mac * mac_p; char mac_p_r_[PADR_(struct mac *)];
 };
 struct kenv_args {
 	char what_l_[PADL_(int)]; int what; char what_r_[PADR_(int)];
 	char name_l_[PADL_(const char *)]; const char * name; char name_r_[PADR_(const char *)];
 	char value_l_[PADL_(char *)]; char * value; char value_r_[PADR_(char *)];
 	char len_l_[PADL_(int)]; int len; char len_r_[PADR_(int)];
 };
 struct lchflags_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char flags_l_[PADL_(u_long)]; u_long flags; char flags_r_[PADR_(u_long)];
 };
 struct uuidgen_args {
 	char store_l_[PADL_(struct uuid *)]; struct uuid * store; char store_r_[PADR_(struct uuid *)];
 	char count_l_[PADL_(int)]; int count; char count_r_[PADR_(int)];
 };
 struct sendfile_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char offset_l_[PADL_(off_t)]; off_t offset; char offset_r_[PADR_(off_t)];
 	char nbytes_l_[PADL_(size_t)]; size_t nbytes; char nbytes_r_[PADR_(size_t)];
 	char hdtr_l_[PADL_(struct sf_hdtr *)]; struct sf_hdtr * hdtr; char hdtr_r_[PADR_(struct sf_hdtr *)];
 	char sbytes_l_[PADL_(off_t *)]; off_t * sbytes; char sbytes_r_[PADR_(off_t *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct mac_syscall_args {
 	char policy_l_[PADL_(const char *)]; const char * policy; char policy_r_[PADR_(const char *)];
 	char call_l_[PADL_(int)]; int call; char call_r_[PADR_(int)];
 	char arg_l_[PADL_(void *)]; void * arg; char arg_r_[PADR_(void *)];
 };
 struct getfsstat_args {
 	char buf_l_[PADL_(struct statfs *)]; struct statfs * buf; char buf_r_[PADR_(struct statfs *)];
 	char bufsize_l_[PADL_(long)]; long bufsize; char bufsize_r_[PADR_(long)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct statfs_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char buf_l_[PADL_(struct statfs *)]; struct statfs * buf; char buf_r_[PADR_(struct statfs *)];
 };
 struct fstatfs_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char buf_l_[PADL_(struct statfs *)]; struct statfs * buf; char buf_r_[PADR_(struct statfs *)];
 };
 struct fhstatfs_args {
 	char u_fhp_l_[PADL_(const struct fhandle *)]; const struct fhandle * u_fhp; char u_fhp_r_[PADR_(const struct fhandle *)];
 	char buf_l_[PADL_(struct statfs *)]; struct statfs * buf; char buf_r_[PADR_(struct statfs *)];
 };
 struct ksem_close_args {
 	char id_l_[PADL_(semid_t)]; semid_t id; char id_r_[PADR_(semid_t)];
 };
 struct ksem_post_args {
 	char id_l_[PADL_(semid_t)]; semid_t id; char id_r_[PADR_(semid_t)];
 };
 struct ksem_wait_args {
 	char id_l_[PADL_(semid_t)]; semid_t id; char id_r_[PADR_(semid_t)];
 };
 struct ksem_trywait_args {
 	char id_l_[PADL_(semid_t)]; semid_t id; char id_r_[PADR_(semid_t)];
 };
 struct ksem_init_args {
 	char idp_l_[PADL_(semid_t *)]; semid_t * idp; char idp_r_[PADR_(semid_t *)];
 	char value_l_[PADL_(unsigned int)]; unsigned int value; char value_r_[PADR_(unsigned int)];
 };
 struct ksem_open_args {
 	char idp_l_[PADL_(semid_t *)]; semid_t * idp; char idp_r_[PADR_(semid_t *)];
 	char name_l_[PADL_(const char *)]; const char * name; char name_r_[PADR_(const char *)];
 	char oflag_l_[PADL_(int)]; int oflag; char oflag_r_[PADR_(int)];
 	char mode_l_[PADL_(mode_t)]; mode_t mode; char mode_r_[PADR_(mode_t)];
 	char value_l_[PADL_(unsigned int)]; unsigned int value; char value_r_[PADR_(unsigned int)];
 };
 struct ksem_unlink_args {
 	char name_l_[PADL_(const char *)]; const char * name; char name_r_[PADR_(const char *)];
 };
 struct ksem_getvalue_args {
 	char id_l_[PADL_(semid_t)]; semid_t id; char id_r_[PADR_(semid_t)];
 	char val_l_[PADL_(int *)]; int * val; char val_r_[PADR_(int *)];
 };
 struct ksem_destroy_args {
 	char id_l_[PADL_(semid_t)]; semid_t id; char id_r_[PADR_(semid_t)];
 };
 struct __mac_get_pid_args {
 	char pid_l_[PADL_(pid_t)]; pid_t pid; char pid_r_[PADR_(pid_t)];
 	char mac_p_l_[PADL_(struct mac *)]; struct mac * mac_p; char mac_p_r_[PADR_(struct mac *)];
 };
 struct __mac_get_link_args {
 	char path_p_l_[PADL_(const char *)]; const char * path_p; char path_p_r_[PADR_(const char *)];
 	char mac_p_l_[PADL_(struct mac *)]; struct mac * mac_p; char mac_p_r_[PADR_(struct mac *)];
 };
 struct __mac_set_link_args {
 	char path_p_l_[PADL_(const char *)]; const char * path_p; char path_p_r_[PADR_(const char *)];
 	char mac_p_l_[PADL_(struct mac *)]; struct mac * mac_p; char mac_p_r_[PADR_(struct mac *)];
 };
 struct extattr_set_link_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char attrname_l_[PADL_(const char *)]; const char * attrname; char attrname_r_[PADR_(const char *)];
 	char data_l_[PADL_(void *)]; void * data; char data_r_[PADR_(void *)];
 	char nbytes_l_[PADL_(size_t)]; size_t nbytes; char nbytes_r_[PADR_(size_t)];
 };
 struct extattr_get_link_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char attrname_l_[PADL_(const char *)]; const char * attrname; char attrname_r_[PADR_(const char *)];
 	char data_l_[PADL_(void *)]; void * data; char data_r_[PADR_(void *)];
 	char nbytes_l_[PADL_(size_t)]; size_t nbytes; char nbytes_r_[PADR_(size_t)];
 };
 struct extattr_delete_link_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char attrname_l_[PADL_(const char *)]; const char * attrname; char attrname_r_[PADR_(const char *)];
 };
 struct __mac_execve_args {
 	char fname_l_[PADL_(char *)]; char * fname; char fname_r_[PADR_(char *)];
 	char argv_l_[PADL_(char **)]; char ** argv; char argv_r_[PADR_(char **)];
 	char envv_l_[PADL_(char **)]; char ** envv; char envv_r_[PADR_(char **)];
 	char mac_p_l_[PADL_(struct mac *)]; struct mac * mac_p; char mac_p_r_[PADR_(struct mac *)];
 };
 struct sigaction_args {
 	char sig_l_[PADL_(int)]; int sig; char sig_r_[PADR_(int)];
 	char act_l_[PADL_(const struct sigaction *)]; const struct sigaction * act; char act_r_[PADR_(const struct sigaction *)];
 	char oact_l_[PADL_(struct sigaction *)]; struct sigaction * oact; char oact_r_[PADR_(struct sigaction *)];
 };
 struct sigreturn_args {
 	char sigcntxp_l_[PADL_(const struct __ucontext *)]; const struct __ucontext * sigcntxp; char sigcntxp_r_[PADR_(const struct __ucontext *)];
 };
 struct getcontext_args {
 	char ucp_l_[PADL_(struct __ucontext *)]; struct __ucontext * ucp; char ucp_r_[PADR_(struct __ucontext *)];
 };
 struct setcontext_args {
 	char ucp_l_[PADL_(const struct __ucontext *)]; const struct __ucontext * ucp; char ucp_r_[PADR_(const struct __ucontext *)];
 };
 struct swapcontext_args {
 	char oucp_l_[PADL_(struct __ucontext *)]; struct __ucontext * oucp; char oucp_r_[PADR_(struct __ucontext *)];
 	char ucp_l_[PADL_(const struct __ucontext *)]; const struct __ucontext * ucp; char ucp_r_[PADR_(const struct __ucontext *)];
 };
 struct swapoff_args {
 	char name_l_[PADL_(const char *)]; const char * name; char name_r_[PADR_(const char *)];
 };
 struct __acl_get_link_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char type_l_[PADL_(acl_type_t)]; acl_type_t type; char type_r_[PADR_(acl_type_t)];
 	char aclp_l_[PADL_(struct acl *)]; struct acl * aclp; char aclp_r_[PADR_(struct acl *)];
 };
 struct __acl_set_link_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char type_l_[PADL_(acl_type_t)]; acl_type_t type; char type_r_[PADR_(acl_type_t)];
 	char aclp_l_[PADL_(struct acl *)]; struct acl * aclp; char aclp_r_[PADR_(struct acl *)];
 };
 struct __acl_delete_link_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char type_l_[PADL_(acl_type_t)]; acl_type_t type; char type_r_[PADR_(acl_type_t)];
 };
 struct __acl_aclcheck_link_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char type_l_[PADL_(acl_type_t)]; acl_type_t type; char type_r_[PADR_(acl_type_t)];
 	char aclp_l_[PADL_(struct acl *)]; struct acl * aclp; char aclp_r_[PADR_(struct acl *)];
 };
 struct sigwait_args {
 	char set_l_[PADL_(const sigset_t *)]; const sigset_t * set; char set_r_[PADR_(const sigset_t *)];
 	char sig_l_[PADL_(int *)]; int * sig; char sig_r_[PADR_(int *)];
 };
 struct thr_create_args {
 	char ctx_l_[PADL_(ucontext_t *)]; ucontext_t * ctx; char ctx_r_[PADR_(ucontext_t *)];
 	char id_l_[PADL_(long *)]; long * id; char id_r_[PADR_(long *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct thr_exit_args {
 	char state_l_[PADL_(long *)]; long * state; char state_r_[PADR_(long *)];
 };
 struct thr_self_args {
 	char id_l_[PADL_(long *)]; long * id; char id_r_[PADR_(long *)];
 };
 struct thr_kill_args {
 	char id_l_[PADL_(long)]; long id; char id_r_[PADR_(long)];
 	char sig_l_[PADL_(int)]; int sig; char sig_r_[PADR_(int)];
 };
 struct jail_attach_args {
 	char jid_l_[PADL_(int)]; int jid; char jid_r_[PADR_(int)];
 };
 struct extattr_list_fd_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char data_l_[PADL_(void *)]; void * data; char data_r_[PADR_(void *)];
 	char nbytes_l_[PADL_(size_t)]; size_t nbytes; char nbytes_r_[PADR_(size_t)];
 };
 struct extattr_list_file_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char data_l_[PADL_(void *)]; void * data; char data_r_[PADR_(void *)];
 	char nbytes_l_[PADL_(size_t)]; size_t nbytes; char nbytes_r_[PADR_(size_t)];
 };
 struct extattr_list_link_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char attrnamespace_l_[PADL_(int)]; int attrnamespace; char attrnamespace_r_[PADR_(int)];
 	char data_l_[PADL_(void *)]; void * data; char data_r_[PADR_(void *)];
 	char nbytes_l_[PADL_(size_t)]; size_t nbytes; char nbytes_r_[PADR_(size_t)];
 };
 struct ksem_timedwait_args {
 	char id_l_[PADL_(semid_t)]; semid_t id; char id_r_[PADR_(semid_t)];
 	char abstime_l_[PADL_(const struct timespec *)]; const struct timespec * abstime; char abstime_r_[PADR_(const struct timespec *)];
 };
 struct thr_suspend_args {
 	char timeout_l_[PADL_(const struct timespec *)]; const struct timespec * timeout; char timeout_r_[PADR_(const struct timespec *)];
 };
 struct thr_wake_args {
 	char id_l_[PADL_(long)]; long id; char id_r_[PADR_(long)];
 };
 struct kldunloadf_args {
 	char fileid_l_[PADL_(int)]; int fileid; char fileid_r_[PADR_(int)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct audit_args {
 	char record_l_[PADL_(const void *)]; const void * record; char record_r_[PADR_(const void *)];
 	char length_l_[PADL_(u_int)]; u_int length; char length_r_[PADR_(u_int)];
 };
 struct auditon_args {
 	char cmd_l_[PADL_(int)]; int cmd; char cmd_r_[PADR_(int)];
 	char data_l_[PADL_(void *)]; void * data; char data_r_[PADR_(void *)];
 	char length_l_[PADL_(u_int)]; u_int length; char length_r_[PADR_(u_int)];
 };
 struct getauid_args {
 	char auid_l_[PADL_(uid_t *)]; uid_t * auid; char auid_r_[PADR_(uid_t *)];
 };
 struct setauid_args {
 	char auid_l_[PADL_(uid_t *)]; uid_t * auid; char auid_r_[PADR_(uid_t *)];
 };
 struct getaudit_args {
 	char auditinfo_l_[PADL_(struct auditinfo *)]; struct auditinfo * auditinfo; char auditinfo_r_[PADR_(struct auditinfo *)];
 };
 struct setaudit_args {
 	char auditinfo_l_[PADL_(struct auditinfo *)]; struct auditinfo * auditinfo; char auditinfo_r_[PADR_(struct auditinfo *)];
 };
 struct getaudit_addr_args {
 	char auditinfo_addr_l_[PADL_(struct auditinfo_addr *)]; struct auditinfo_addr * auditinfo_addr; char auditinfo_addr_r_[PADR_(struct auditinfo_addr *)];
 	char length_l_[PADL_(u_int)]; u_int length; char length_r_[PADR_(u_int)];
 };
 struct setaudit_addr_args {
 	char auditinfo_addr_l_[PADL_(struct auditinfo_addr *)]; struct auditinfo_addr * auditinfo_addr; char auditinfo_addr_r_[PADR_(struct auditinfo_addr *)];
 	char length_l_[PADL_(u_int)]; u_int length; char length_r_[PADR_(u_int)];
 };
 struct auditctl_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 };
 struct _umtx_op_args {
 	char obj_l_[PADL_(void *)]; void * obj; char obj_r_[PADR_(void *)];
 	char op_l_[PADL_(int)]; int op; char op_r_[PADR_(int)];
 	char val_l_[PADL_(u_long)]; u_long val; char val_r_[PADR_(u_long)];
 	char uaddr1_l_[PADL_(void *)]; void * uaddr1; char uaddr1_r_[PADR_(void *)];
 	char uaddr2_l_[PADL_(void *)]; void * uaddr2; char uaddr2_r_[PADR_(void *)];
 };
 struct thr_new_args {
 	char param_l_[PADL_(struct thr_param *)]; struct thr_param * param; char param_r_[PADR_(struct thr_param *)];
 	char param_size_l_[PADL_(int)]; int param_size; char param_size_r_[PADR_(int)];
 };
 struct sigqueue_args {
 	char pid_l_[PADL_(pid_t)]; pid_t pid; char pid_r_[PADR_(pid_t)];
 	char signum_l_[PADL_(int)]; int signum; char signum_r_[PADR_(int)];
 	char value_l_[PADL_(void *)]; void * value; char value_r_[PADR_(void *)];
 };
 struct kmq_open_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 	char mode_l_[PADL_(mode_t)]; mode_t mode; char mode_r_[PADR_(mode_t)];
 	char attr_l_[PADL_(const struct mq_attr *)]; const struct mq_attr * attr; char attr_r_[PADR_(const struct mq_attr *)];
 };
 struct kmq_setattr_args {
 	char mqd_l_[PADL_(int)]; int mqd; char mqd_r_[PADR_(int)];
 	char attr_l_[PADL_(const struct mq_attr *)]; const struct mq_attr * attr; char attr_r_[PADR_(const struct mq_attr *)];
 	char oattr_l_[PADL_(struct mq_attr *)]; struct mq_attr * oattr; char oattr_r_[PADR_(struct mq_attr *)];
 };
 struct kmq_timedreceive_args {
 	char mqd_l_[PADL_(int)]; int mqd; char mqd_r_[PADR_(int)];
 	char msg_ptr_l_[PADL_(char *)]; char * msg_ptr; char msg_ptr_r_[PADR_(char *)];
 	char msg_len_l_[PADL_(size_t)]; size_t msg_len; char msg_len_r_[PADR_(size_t)];
 	char msg_prio_l_[PADL_(unsigned *)]; unsigned * msg_prio; char msg_prio_r_[PADR_(unsigned *)];
 	char abs_timeout_l_[PADL_(const struct timespec *)]; const struct timespec * abs_timeout; char abs_timeout_r_[PADR_(const struct timespec *)];
 };
 struct kmq_timedsend_args {
 	char mqd_l_[PADL_(int)]; int mqd; char mqd_r_[PADR_(int)];
 	char msg_ptr_l_[PADL_(const char *)]; const char * msg_ptr; char msg_ptr_r_[PADR_(const char *)];
 	char msg_len_l_[PADL_(size_t)]; size_t msg_len; char msg_len_r_[PADR_(size_t)];
 	char msg_prio_l_[PADL_(unsigned)]; unsigned msg_prio; char msg_prio_r_[PADR_(unsigned)];
 	char abs_timeout_l_[PADL_(const struct timespec *)]; const struct timespec * abs_timeout; char abs_timeout_r_[PADR_(const struct timespec *)];
 };
 struct kmq_notify_args {
 	char mqd_l_[PADL_(int)]; int mqd; char mqd_r_[PADR_(int)];
 	char sigev_l_[PADL_(const struct sigevent *)]; const struct sigevent * sigev; char sigev_r_[PADR_(const struct sigevent *)];
 };
 struct kmq_unlink_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 };
 struct abort2_args {
 	char why_l_[PADL_(const char *)]; const char * why; char why_r_[PADR_(const char *)];
 	char nargs_l_[PADL_(int)]; int nargs; char nargs_r_[PADR_(int)];
 	char args_l_[PADL_(void **)]; void ** args; char args_r_[PADR_(void **)];
 };
 struct thr_set_name_args {
 	char id_l_[PADL_(long)]; long id; char id_r_[PADR_(long)];
 	char name_l_[PADL_(const char *)]; const char * name; char name_r_[PADR_(const char *)];
 };
 struct aio_fsync_args {
 	char op_l_[PADL_(int)]; int op; char op_r_[PADR_(int)];
 	char aiocbp_l_[PADL_(struct aiocb *)]; struct aiocb * aiocbp; char aiocbp_r_[PADR_(struct aiocb *)];
 };
 struct rtprio_thread_args {
 	char function_l_[PADL_(int)]; int function; char function_r_[PADR_(int)];
 	char lwpid_l_[PADL_(lwpid_t)]; lwpid_t lwpid; char lwpid_r_[PADR_(lwpid_t)];
 	char rtp_l_[PADL_(struct rtprio *)]; struct rtprio * rtp; char rtp_r_[PADR_(struct rtprio *)];
 };
 struct sctp_peeloff_args {
 	char sd_l_[PADL_(int)]; int sd; char sd_r_[PADR_(int)];
 	char name_l_[PADL_(uint32_t)]; uint32_t name; char name_r_[PADR_(uint32_t)];
 };
 struct sctp_generic_sendmsg_args {
 	char sd_l_[PADL_(int)]; int sd; char sd_r_[PADR_(int)];
 	char msg_l_[PADL_(caddr_t)]; caddr_t msg; char msg_r_[PADR_(caddr_t)];
 	char mlen_l_[PADL_(int)]; int mlen; char mlen_r_[PADR_(int)];
 	char to_l_[PADL_(caddr_t)]; caddr_t to; char to_r_[PADR_(caddr_t)];
 	char tolen_l_[PADL_(__socklen_t)]; __socklen_t tolen; char tolen_r_[PADR_(__socklen_t)];
 	char sinfo_l_[PADL_(struct sctp_sndrcvinfo *)]; struct sctp_sndrcvinfo * sinfo; char sinfo_r_[PADR_(struct sctp_sndrcvinfo *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct sctp_generic_sendmsg_iov_args {
 	char sd_l_[PADL_(int)]; int sd; char sd_r_[PADR_(int)];
 	char iov_l_[PADL_(struct iovec *)]; struct iovec * iov; char iov_r_[PADR_(struct iovec *)];
 	char iovlen_l_[PADL_(int)]; int iovlen; char iovlen_r_[PADR_(int)];
 	char to_l_[PADL_(caddr_t)]; caddr_t to; char to_r_[PADR_(caddr_t)];
 	char tolen_l_[PADL_(__socklen_t)]; __socklen_t tolen; char tolen_r_[PADR_(__socklen_t)];
 	char sinfo_l_[PADL_(struct sctp_sndrcvinfo *)]; struct sctp_sndrcvinfo * sinfo; char sinfo_r_[PADR_(struct sctp_sndrcvinfo *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct sctp_generic_recvmsg_args {
 	char sd_l_[PADL_(int)]; int sd; char sd_r_[PADR_(int)];
 	char iov_l_[PADL_(struct iovec *)]; struct iovec * iov; char iov_r_[PADR_(struct iovec *)];
 	char iovlen_l_[PADL_(int)]; int iovlen; char iovlen_r_[PADR_(int)];
 	char from_l_[PADL_(struct sockaddr *)]; struct sockaddr * from; char from_r_[PADR_(struct sockaddr *)];
 	char fromlenaddr_l_[PADL_(__socklen_t *)]; __socklen_t * fromlenaddr; char fromlenaddr_r_[PADR_(__socklen_t *)];
 	char sinfo_l_[PADL_(struct sctp_sndrcvinfo *)]; struct sctp_sndrcvinfo * sinfo; char sinfo_r_[PADR_(struct sctp_sndrcvinfo *)];
 	char msg_flags_l_[PADL_(int *)]; int * msg_flags; char msg_flags_r_[PADR_(int *)];
 };
 struct pread_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char buf_l_[PADL_(void *)]; void * buf; char buf_r_[PADR_(void *)];
 	char nbyte_l_[PADL_(size_t)]; size_t nbyte; char nbyte_r_[PADR_(size_t)];
 	char offset_l_[PADL_(off_t)]; off_t offset; char offset_r_[PADR_(off_t)];
 };
 struct pwrite_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char buf_l_[PADL_(const void *)]; const void * buf; char buf_r_[PADR_(const void *)];
 	char nbyte_l_[PADL_(size_t)]; size_t nbyte; char nbyte_r_[PADR_(size_t)];
 	char offset_l_[PADL_(off_t)]; off_t offset; char offset_r_[PADR_(off_t)];
 };
 struct mmap_args {
 	char addr_l_[PADL_(caddr_t)]; caddr_t addr; char addr_r_[PADR_(caddr_t)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 	char prot_l_[PADL_(int)]; int prot; char prot_r_[PADR_(int)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char pos_l_[PADL_(off_t)]; off_t pos; char pos_r_[PADR_(off_t)];
 };
 struct lseek_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char offset_l_[PADL_(off_t)]; off_t offset; char offset_r_[PADR_(off_t)];
 	char whence_l_[PADL_(int)]; int whence; char whence_r_[PADR_(int)];
 };
 struct truncate_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char length_l_[PADL_(off_t)]; off_t length; char length_r_[PADR_(off_t)];
 };
 struct ftruncate_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char length_l_[PADL_(off_t)]; off_t length; char length_r_[PADR_(off_t)];
 };
 struct thr_kill2_args {
 	char pid_l_[PADL_(pid_t)]; pid_t pid; char pid_r_[PADR_(pid_t)];
 	char id_l_[PADL_(long)]; long id; char id_r_[PADR_(long)];
 	char sig_l_[PADL_(int)]; int sig; char sig_r_[PADR_(int)];
 };
 struct shm_open_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 	char mode_l_[PADL_(mode_t)]; mode_t mode; char mode_r_[PADR_(mode_t)];
 };
 struct shm_unlink_args {
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 };
 struct cpuset_args {
 	char setid_l_[PADL_(cpusetid_t *)]; cpusetid_t * setid; char setid_r_[PADR_(cpusetid_t *)];
 };
 struct cpuset_setid_args {
 	char which_l_[PADL_(cpuwhich_t)]; cpuwhich_t which; char which_r_[PADR_(cpuwhich_t)];
 	char id_l_[PADL_(id_t)]; id_t id; char id_r_[PADR_(id_t)];
 	char setid_l_[PADL_(cpusetid_t)]; cpusetid_t setid; char setid_r_[PADR_(cpusetid_t)];
 };
 struct cpuset_getid_args {
 	char level_l_[PADL_(cpulevel_t)]; cpulevel_t level; char level_r_[PADR_(cpulevel_t)];
 	char which_l_[PADL_(cpuwhich_t)]; cpuwhich_t which; char which_r_[PADR_(cpuwhich_t)];
 	char id_l_[PADL_(id_t)]; id_t id; char id_r_[PADR_(id_t)];
 	char setid_l_[PADL_(cpusetid_t *)]; cpusetid_t * setid; char setid_r_[PADR_(cpusetid_t *)];
 };
 struct cpuset_getaffinity_args {
 	char level_l_[PADL_(cpulevel_t)]; cpulevel_t level; char level_r_[PADR_(cpulevel_t)];
 	char which_l_[PADL_(cpuwhich_t)]; cpuwhich_t which; char which_r_[PADR_(cpuwhich_t)];
 	char id_l_[PADL_(id_t)]; id_t id; char id_r_[PADR_(id_t)];
 	char cpusetsize_l_[PADL_(size_t)]; size_t cpusetsize; char cpusetsize_r_[PADR_(size_t)];
 	char mask_l_[PADL_(cpuset_t *)]; cpuset_t * mask; char mask_r_[PADR_(cpuset_t *)];
 };
 struct cpuset_setaffinity_args {
 	char level_l_[PADL_(cpulevel_t)]; cpulevel_t level; char level_r_[PADR_(cpulevel_t)];
 	char which_l_[PADL_(cpuwhich_t)]; cpuwhich_t which; char which_r_[PADR_(cpuwhich_t)];
 	char id_l_[PADL_(id_t)]; id_t id; char id_r_[PADR_(id_t)];
 	char cpusetsize_l_[PADL_(size_t)]; size_t cpusetsize; char cpusetsize_r_[PADR_(size_t)];
 	char mask_l_[PADL_(const cpuset_t *)]; const cpuset_t * mask; char mask_r_[PADR_(const cpuset_t *)];
 };
 struct faccessat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char amode_l_[PADL_(int)]; int amode; char amode_r_[PADR_(int)];
 	char flag_l_[PADL_(int)]; int flag; char flag_r_[PADR_(int)];
 };
 struct fchmodat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char mode_l_[PADL_(mode_t)]; mode_t mode; char mode_r_[PADR_(mode_t)];
 	char flag_l_[PADL_(int)]; int flag; char flag_r_[PADR_(int)];
 };
 struct fchownat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char uid_l_[PADL_(uid_t)]; uid_t uid; char uid_r_[PADR_(uid_t)];
 	char gid_l_[PADL_(gid_t)]; gid_t gid; char gid_r_[PADR_(gid_t)];
 	char flag_l_[PADL_(int)]; int flag; char flag_r_[PADR_(int)];
 };
 struct fexecve_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char argv_l_[PADL_(char **)]; char ** argv; char argv_r_[PADR_(char **)];
 	char envv_l_[PADL_(char **)]; char ** envv; char envv_r_[PADR_(char **)];
 };
 struct fstatat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char buf_l_[PADL_(struct stat *)]; struct stat * buf; char buf_r_[PADR_(struct stat *)];
 	char flag_l_[PADL_(int)]; int flag; char flag_r_[PADR_(int)];
 };
 struct futimesat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char times_l_[PADL_(struct timeval *)]; struct timeval * times; char times_r_[PADR_(struct timeval *)];
 };
 struct linkat_args {
 	char fd1_l_[PADL_(int)]; int fd1; char fd1_r_[PADR_(int)];
 	char path1_l_[PADL_(char *)]; char * path1; char path1_r_[PADR_(char *)];
 	char fd2_l_[PADL_(int)]; int fd2; char fd2_r_[PADR_(int)];
 	char path2_l_[PADL_(char *)]; char * path2; char path2_r_[PADR_(char *)];
 	char flag_l_[PADL_(int)]; int flag; char flag_r_[PADR_(int)];
 };
 struct mkdirat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char mode_l_[PADL_(mode_t)]; mode_t mode; char mode_r_[PADR_(mode_t)];
 };
 struct mkfifoat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char mode_l_[PADL_(mode_t)]; mode_t mode; char mode_r_[PADR_(mode_t)];
 };
 struct mknodat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char mode_l_[PADL_(mode_t)]; mode_t mode; char mode_r_[PADR_(mode_t)];
 	char dev_l_[PADL_(dev_t)]; dev_t dev; char dev_r_[PADR_(dev_t)];
 };
 struct openat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char flag_l_[PADL_(int)]; int flag; char flag_r_[PADR_(int)];
 	char mode_l_[PADL_(mode_t)]; mode_t mode; char mode_r_[PADR_(mode_t)];
 };
 struct readlinkat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char buf_l_[PADL_(char *)]; char * buf; char buf_r_[PADR_(char *)];
 	char bufsize_l_[PADL_(size_t)]; size_t bufsize; char bufsize_r_[PADR_(size_t)];
 };
 struct renameat_args {
 	char oldfd_l_[PADL_(int)]; int oldfd; char oldfd_r_[PADR_(int)];
 	char old_l_[PADL_(char *)]; char * old; char old_r_[PADR_(char *)];
 	char newfd_l_[PADL_(int)]; int newfd; char newfd_r_[PADR_(int)];
 	char new_l_[PADL_(char *)]; char * new; char new_r_[PADR_(char *)];
 };
 struct symlinkat_args {
 	char path1_l_[PADL_(char *)]; char * path1; char path1_r_[PADR_(char *)];
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path2_l_[PADL_(char *)]; char * path2; char path2_r_[PADR_(char *)];
 };
 struct unlinkat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char flag_l_[PADL_(int)]; int flag; char flag_r_[PADR_(int)];
 };
 struct posix_openpt_args {
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct gssd_syscall_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 };
 struct jail_get_args {
 	char iovp_l_[PADL_(struct iovec *)]; struct iovec * iovp; char iovp_r_[PADR_(struct iovec *)];
 	char iovcnt_l_[PADL_(unsigned int)]; unsigned int iovcnt; char iovcnt_r_[PADR_(unsigned int)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct jail_set_args {
 	char iovp_l_[PADL_(struct iovec *)]; struct iovec * iovp; char iovp_r_[PADR_(struct iovec *)];
 	char iovcnt_l_[PADL_(unsigned int)]; unsigned int iovcnt; char iovcnt_r_[PADR_(unsigned int)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct jail_remove_args {
 	char jid_l_[PADL_(int)]; int jid; char jid_r_[PADR_(int)];
 };
 struct closefrom_args {
 	char lowfd_l_[PADL_(int)]; int lowfd; char lowfd_r_[PADR_(int)];
 };
 struct __semctl_args {
 	char semid_l_[PADL_(int)]; int semid; char semid_r_[PADR_(int)];
 	char semnum_l_[PADL_(int)]; int semnum; char semnum_r_[PADR_(int)];
 	char cmd_l_[PADL_(int)]; int cmd; char cmd_r_[PADR_(int)];
 	char arg_l_[PADL_(union semun *)]; union semun * arg; char arg_r_[PADR_(union semun *)];
 };
 struct msgctl_args {
 	char msqid_l_[PADL_(int)]; int msqid; char msqid_r_[PADR_(int)];
 	char cmd_l_[PADL_(int)]; int cmd; char cmd_r_[PADR_(int)];
 	char buf_l_[PADL_(struct msqid_ds *)]; struct msqid_ds * buf; char buf_r_[PADR_(struct msqid_ds *)];
 };
 struct shmctl_args {
 	char shmid_l_[PADL_(int)]; int shmid; char shmid_r_[PADR_(int)];
 	char cmd_l_[PADL_(int)]; int cmd; char cmd_r_[PADR_(int)];
 	char buf_l_[PADL_(struct shmid_ds *)]; struct shmid_ds * buf; char buf_r_[PADR_(struct shmid_ds *)];
 };
 struct lpathconf_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char name_l_[PADL_(int)]; int name; char name_r_[PADR_(int)];
 };
 struct __cap_rights_get_args {
 	char version_l_[PADL_(int)]; int version; char version_r_[PADR_(int)];
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char rightsp_l_[PADL_(cap_rights_t *)]; cap_rights_t * rightsp; char rightsp_r_[PADR_(cap_rights_t *)];
 };
 struct cap_enter_args {
 	register_t dummy;
 };
 struct cap_getmode_args {
 	char modep_l_[PADL_(u_int *)]; u_int * modep; char modep_r_[PADR_(u_int *)];
 };
 struct pdfork_args {
 	char fdp_l_[PADL_(int *)]; int * fdp; char fdp_r_[PADR_(int *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct pdkill_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char signum_l_[PADL_(int)]; int signum; char signum_r_[PADR_(int)];
 };
 struct pdgetpid_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char pidp_l_[PADL_(pid_t *)]; pid_t * pidp; char pidp_r_[PADR_(pid_t *)];
 };
 struct pselect_args {
 	char nd_l_[PADL_(int)]; int nd; char nd_r_[PADR_(int)];
 	char in_l_[PADL_(fd_set *)]; fd_set * in; char in_r_[PADR_(fd_set *)];
 	char ou_l_[PADL_(fd_set *)]; fd_set * ou; char ou_r_[PADR_(fd_set *)];
 	char ex_l_[PADL_(fd_set *)]; fd_set * ex; char ex_r_[PADR_(fd_set *)];
 	char ts_l_[PADL_(const struct timespec *)]; const struct timespec * ts; char ts_r_[PADR_(const struct timespec *)];
 	char sm_l_[PADL_(const sigset_t *)]; const sigset_t * sm; char sm_r_[PADR_(const sigset_t *)];
 };
 struct getloginclass_args {
 	char namebuf_l_[PADL_(char *)]; char * namebuf; char namebuf_r_[PADR_(char *)];
 	char namelen_l_[PADL_(size_t)]; size_t namelen; char namelen_r_[PADR_(size_t)];
 };
 struct setloginclass_args {
 	char namebuf_l_[PADL_(const char *)]; const char * namebuf; char namebuf_r_[PADR_(const char *)];
 };
 struct rctl_get_racct_args {
 	char inbufp_l_[PADL_(const void *)]; const void * inbufp; char inbufp_r_[PADR_(const void *)];
 	char inbuflen_l_[PADL_(size_t)]; size_t inbuflen; char inbuflen_r_[PADR_(size_t)];
 	char outbufp_l_[PADL_(void *)]; void * outbufp; char outbufp_r_[PADR_(void *)];
 	char outbuflen_l_[PADL_(size_t)]; size_t outbuflen; char outbuflen_r_[PADR_(size_t)];
 };
 struct rctl_get_rules_args {
 	char inbufp_l_[PADL_(const void *)]; const void * inbufp; char inbufp_r_[PADR_(const void *)];
 	char inbuflen_l_[PADL_(size_t)]; size_t inbuflen; char inbuflen_r_[PADR_(size_t)];
 	char outbufp_l_[PADL_(void *)]; void * outbufp; char outbufp_r_[PADR_(void *)];
 	char outbuflen_l_[PADL_(size_t)]; size_t outbuflen; char outbuflen_r_[PADR_(size_t)];
 };
 struct rctl_get_limits_args {
 	char inbufp_l_[PADL_(const void *)]; const void * inbufp; char inbufp_r_[PADR_(const void *)];
 	char inbuflen_l_[PADL_(size_t)]; size_t inbuflen; char inbuflen_r_[PADR_(size_t)];
 	char outbufp_l_[PADL_(void *)]; void * outbufp; char outbufp_r_[PADR_(void *)];
 	char outbuflen_l_[PADL_(size_t)]; size_t outbuflen; char outbuflen_r_[PADR_(size_t)];
 };
 struct rctl_add_rule_args {
 	char inbufp_l_[PADL_(const void *)]; const void * inbufp; char inbufp_r_[PADR_(const void *)];
 	char inbuflen_l_[PADL_(size_t)]; size_t inbuflen; char inbuflen_r_[PADR_(size_t)];
 	char outbufp_l_[PADL_(void *)]; void * outbufp; char outbufp_r_[PADR_(void *)];
 	char outbuflen_l_[PADL_(size_t)]; size_t outbuflen; char outbuflen_r_[PADR_(size_t)];
 };
 struct rctl_remove_rule_args {
 	char inbufp_l_[PADL_(const void *)]; const void * inbufp; char inbufp_r_[PADR_(const void *)];
 	char inbuflen_l_[PADL_(size_t)]; size_t inbuflen; char inbuflen_r_[PADR_(size_t)];
 	char outbufp_l_[PADL_(void *)]; void * outbufp; char outbufp_r_[PADR_(void *)];
 	char outbuflen_l_[PADL_(size_t)]; size_t outbuflen; char outbuflen_r_[PADR_(size_t)];
 };
 struct posix_fallocate_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char offset_l_[PADL_(off_t)]; off_t offset; char offset_r_[PADR_(off_t)];
 	char len_l_[PADL_(off_t)]; off_t len; char len_r_[PADR_(off_t)];
 };
 struct posix_fadvise_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char offset_l_[PADL_(off_t)]; off_t offset; char offset_r_[PADR_(off_t)];
 	char len_l_[PADL_(off_t)]; off_t len; char len_r_[PADR_(off_t)];
 	char advice_l_[PADL_(int)]; int advice; char advice_r_[PADR_(int)];
 };
 struct wait6_args {
 	char idtype_l_[PADL_(idtype_t)]; idtype_t idtype; char idtype_r_[PADR_(idtype_t)];
 	char id_l_[PADL_(id_t)]; id_t id; char id_r_[PADR_(id_t)];
 	char status_l_[PADL_(int *)]; int * status; char status_r_[PADR_(int *)];
 	char options_l_[PADL_(int)]; int options; char options_r_[PADR_(int)];
 	char wrusage_l_[PADL_(struct __wrusage *)]; struct __wrusage * wrusage; char wrusage_r_[PADR_(struct __wrusage *)];
 	char info_l_[PADL_(siginfo_t *)]; siginfo_t * info; char info_r_[PADR_(siginfo_t *)];
 };
 struct cap_rights_limit_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char rightsp_l_[PADL_(cap_rights_t *)]; cap_rights_t * rightsp; char rightsp_r_[PADR_(cap_rights_t *)];
 };
 struct cap_ioctls_limit_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char cmds_l_[PADL_(const u_long *)]; const u_long * cmds; char cmds_r_[PADR_(const u_long *)];
 	char ncmds_l_[PADL_(size_t)]; size_t ncmds; char ncmds_r_[PADR_(size_t)];
 };
 struct cap_ioctls_get_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char cmds_l_[PADL_(u_long *)]; u_long * cmds; char cmds_r_[PADR_(u_long *)];
 	char maxcmds_l_[PADL_(size_t)]; size_t maxcmds; char maxcmds_r_[PADR_(size_t)];
 };
 struct cap_fcntls_limit_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char fcntlrights_l_[PADL_(uint32_t)]; uint32_t fcntlrights; char fcntlrights_r_[PADR_(uint32_t)];
 };
 struct cap_fcntls_get_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char fcntlrightsp_l_[PADL_(uint32_t *)]; uint32_t * fcntlrightsp; char fcntlrightsp_r_[PADR_(uint32_t *)];
 };
 struct bindat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char name_l_[PADL_(caddr_t)]; caddr_t name; char name_r_[PADR_(caddr_t)];
 	char namelen_l_[PADL_(int)]; int namelen; char namelen_r_[PADR_(int)];
 };
 struct connectat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char name_l_[PADL_(caddr_t)]; caddr_t name; char name_r_[PADR_(caddr_t)];
 	char namelen_l_[PADL_(int)]; int namelen; char namelen_r_[PADR_(int)];
 };
 struct chflagsat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(const char *)]; const char * path; char path_r_[PADR_(const char *)];
 	char flags_l_[PADL_(u_long)]; u_long flags; char flags_r_[PADR_(u_long)];
 	char atflag_l_[PADL_(int)]; int atflag; char atflag_r_[PADR_(int)];
 };
 struct accept4_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char name_l_[PADL_(struct sockaddr *__restrict)]; struct sockaddr *__restrict name; char name_r_[PADR_(struct sockaddr *__restrict)];
 	char anamelen_l_[PADL_(__socklen_t *__restrict)]; __socklen_t *__restrict anamelen; char anamelen_r_[PADR_(__socklen_t *__restrict)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct pipe2_args {
 	char fildes_l_[PADL_(int *)]; int * fildes; char fildes_r_[PADR_(int *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct aio_mlock_args {
 	char aiocbp_l_[PADL_(struct aiocb *)]; struct aiocb * aiocbp; char aiocbp_r_[PADR_(struct aiocb *)];
 };
 struct procctl_args {
 	char idtype_l_[PADL_(idtype_t)]; idtype_t idtype; char idtype_r_[PADR_(idtype_t)];
 	char id_l_[PADL_(id_t)]; id_t id; char id_r_[PADR_(id_t)];
 	char com_l_[PADL_(int)]; int com; char com_r_[PADR_(int)];
 	char data_l_[PADL_(void *)]; void * data; char data_r_[PADR_(void *)];
 };
 struct ppoll_args {
 	char fds_l_[PADL_(struct pollfd *)]; struct pollfd * fds; char fds_r_[PADR_(struct pollfd *)];
 	char nfds_l_[PADL_(u_int)]; u_int nfds; char nfds_r_[PADR_(u_int)];
 	char ts_l_[PADL_(const struct timespec *)]; const struct timespec * ts; char ts_r_[PADR_(const struct timespec *)];
 	char set_l_[PADL_(const sigset_t *)]; const sigset_t * set; char set_r_[PADR_(const sigset_t *)];
 };
 struct futimens_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char times_l_[PADL_(struct timespec *)]; struct timespec * times; char times_r_[PADR_(struct timespec *)];
 };
 struct utimensat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char times_l_[PADL_(struct timespec *)]; struct timespec * times; char times_r_[PADR_(struct timespec *)];
 	char flag_l_[PADL_(int)]; int flag; char flag_r_[PADR_(int)];
 };
 struct numa_getaffinity_args {
 	char which_l_[PADL_(cpuwhich_t)]; cpuwhich_t which; char which_r_[PADR_(cpuwhich_t)];
 	char id_l_[PADL_(id_t)]; id_t id; char id_r_[PADR_(id_t)];
 	char policy_l_[PADL_(struct vm_domain_policy_entry *)]; struct vm_domain_policy_entry * policy; char policy_r_[PADR_(struct vm_domain_policy_entry *)];
 };
 struct numa_setaffinity_args {
 	char which_l_[PADL_(cpuwhich_t)]; cpuwhich_t which; char which_r_[PADR_(cpuwhich_t)];
 	char id_l_[PADL_(id_t)]; id_t id; char id_r_[PADR_(id_t)];
 	char policy_l_[PADL_(const struct vm_domain_policy_entry *)]; const struct vm_domain_policy_entry * policy; char policy_r_[PADR_(const struct vm_domain_policy_entry *)];
 };
 int	nosys(struct thread *, struct nosys_args *);
 void	sys_sys_exit(struct thread *, struct sys_exit_args *);
 int	sys_fork(struct thread *, struct fork_args *);
 int	sys_read(struct thread *, struct read_args *);
 int	sys_write(struct thread *, struct write_args *);
 int	sys_open(struct thread *, struct open_args *);
 int	sys_close(struct thread *, struct close_args *);
 int	sys_wait4(struct thread *, struct wait4_args *);
 int	sys_link(struct thread *, struct link_args *);
 int	sys_unlink(struct thread *, struct unlink_args *);
 int	sys_chdir(struct thread *, struct chdir_args *);
 int	sys_fchdir(struct thread *, struct fchdir_args *);
 int	sys_mknod(struct thread *, struct mknod_args *);
 int	sys_chmod(struct thread *, struct chmod_args *);
 int	sys_chown(struct thread *, struct chown_args *);
 int	sys_obreak(struct thread *, struct obreak_args *);
 int	sys_getpid(struct thread *, struct getpid_args *);
 int	sys_mount(struct thread *, struct mount_args *);
 int	sys_unmount(struct thread *, struct unmount_args *);
 int	sys_setuid(struct thread *, struct setuid_args *);
 int	sys_getuid(struct thread *, struct getuid_args *);
 int	sys_geteuid(struct thread *, struct geteuid_args *);
 int	sys_ptrace(struct thread *, struct ptrace_args *);
 int	sys_recvmsg(struct thread *, struct recvmsg_args *);
 int	sys_sendmsg(struct thread *, struct sendmsg_args *);
 int	sys_recvfrom(struct thread *, struct recvfrom_args *);
 int	sys_accept(struct thread *, struct accept_args *);
 int	sys_getpeername(struct thread *, struct getpeername_args *);
 int	sys_getsockname(struct thread *, struct getsockname_args *);
 int	sys_access(struct thread *, struct access_args *);
 int	sys_chflags(struct thread *, struct chflags_args *);
 int	sys_fchflags(struct thread *, struct fchflags_args *);
 int	sys_sync(struct thread *, struct sync_args *);
 int	sys_kill(struct thread *, struct kill_args *);
 int	sys_getppid(struct thread *, struct getppid_args *);
 int	sys_dup(struct thread *, struct dup_args *);
 int	sys_getegid(struct thread *, struct getegid_args *);
 int	sys_profil(struct thread *, struct profil_args *);
 int	sys_ktrace(struct thread *, struct ktrace_args *);
 int	sys_getgid(struct thread *, struct getgid_args *);
 int	sys_getlogin(struct thread *, struct getlogin_args *);
 int	sys_setlogin(struct thread *, struct setlogin_args *);
 int	sys_acct(struct thread *, struct acct_args *);
 int	sys_sigaltstack(struct thread *, struct sigaltstack_args *);
 int	sys_ioctl(struct thread *, struct ioctl_args *);
 int	sys_reboot(struct thread *, struct reboot_args *);
 int	sys_revoke(struct thread *, struct revoke_args *);
 int	sys_symlink(struct thread *, struct symlink_args *);
 int	sys_readlink(struct thread *, struct readlink_args *);
 int	sys_execve(struct thread *, struct execve_args *);
 int	sys_umask(struct thread *, struct umask_args *);
 int	sys_chroot(struct thread *, struct chroot_args *);
 int	sys_msync(struct thread *, struct msync_args *);
 int	sys_vfork(struct thread *, struct vfork_args *);
 int	sys_sbrk(struct thread *, struct sbrk_args *);
 int	sys_sstk(struct thread *, struct sstk_args *);
 int	sys_ovadvise(struct thread *, struct ovadvise_args *);
 int	sys_munmap(struct thread *, struct munmap_args *);
 int	sys_mprotect(struct thread *, struct mprotect_args *);
 int	sys_madvise(struct thread *, struct madvise_args *);
 int	sys_mincore(struct thread *, struct mincore_args *);
 int	sys_getgroups(struct thread *, struct getgroups_args *);
 int	sys_setgroups(struct thread *, struct setgroups_args *);
 int	sys_getpgrp(struct thread *, struct getpgrp_args *);
 int	sys_setpgid(struct thread *, struct setpgid_args *);
 int	sys_setitimer(struct thread *, struct setitimer_args *);
 int	sys_swapon(struct thread *, struct swapon_args *);
 int	sys_getitimer(struct thread *, struct getitimer_args *);
 int	sys_getdtablesize(struct thread *, struct getdtablesize_args *);
 int	sys_dup2(struct thread *, struct dup2_args *);
 int	sys_fcntl(struct thread *, struct fcntl_args *);
 int	sys_select(struct thread *, struct select_args *);
 int	sys_fsync(struct thread *, struct fsync_args *);
 int	sys_setpriority(struct thread *, struct setpriority_args *);
 int	sys_socket(struct thread *, struct socket_args *);
 int	sys_connect(struct thread *, struct connect_args *);
 int	sys_getpriority(struct thread *, struct getpriority_args *);
 int	sys_bind(struct thread *, struct bind_args *);
 int	sys_setsockopt(struct thread *, struct setsockopt_args *);
 int	sys_listen(struct thread *, struct listen_args *);
 int	sys_gettimeofday(struct thread *, struct gettimeofday_args *);
 int	sys_getrusage(struct thread *, struct getrusage_args *);
 int	sys_getsockopt(struct thread *, struct getsockopt_args *);
 int	sys_readv(struct thread *, struct readv_args *);
 int	sys_writev(struct thread *, struct writev_args *);
 int	sys_settimeofday(struct thread *, struct settimeofday_args *);
 int	sys_fchown(struct thread *, struct fchown_args *);
 int	sys_fchmod(struct thread *, struct fchmod_args *);
 int	sys_setreuid(struct thread *, struct setreuid_args *);
 int	sys_setregid(struct thread *, struct setregid_args *);
 int	sys_rename(struct thread *, struct rename_args *);
 int	sys_flock(struct thread *, struct flock_args *);
 int	sys_mkfifo(struct thread *, struct mkfifo_args *);
 int	sys_sendto(struct thread *, struct sendto_args *);
 int	sys_shutdown(struct thread *, struct shutdown_args *);
 int	sys_socketpair(struct thread *, struct socketpair_args *);
 int	sys_mkdir(struct thread *, struct mkdir_args *);
 int	sys_rmdir(struct thread *, struct rmdir_args *);
 int	sys_utimes(struct thread *, struct utimes_args *);
 int	sys_adjtime(struct thread *, struct adjtime_args *);
 int	sys_setsid(struct thread *, struct setsid_args *);
 int	sys_quotactl(struct thread *, struct quotactl_args *);
 int	sys_nlm_syscall(struct thread *, struct nlm_syscall_args *);
 int	sys_nfssvc(struct thread *, struct nfssvc_args *);
 int	sys_lgetfh(struct thread *, struct lgetfh_args *);
 int	sys_getfh(struct thread *, struct getfh_args *);
 int	sysarch(struct thread *, struct sysarch_args *);
 int	sys_rtprio(struct thread *, struct rtprio_args *);
 int	sys_semsys(struct thread *, struct semsys_args *);
 int	sys_msgsys(struct thread *, struct msgsys_args *);
 int	sys_shmsys(struct thread *, struct shmsys_args *);
 int	sys_setfib(struct thread *, struct setfib_args *);
 int	sys_ntp_adjtime(struct thread *, struct ntp_adjtime_args *);
 int	sys_setgid(struct thread *, struct setgid_args *);
 int	sys_setegid(struct thread *, struct setegid_args *);
 int	sys_seteuid(struct thread *, struct seteuid_args *);
 int	sys_stat(struct thread *, struct stat_args *);
 int	sys_fstat(struct thread *, struct fstat_args *);
 int	sys_lstat(struct thread *, struct lstat_args *);
 int	sys_pathconf(struct thread *, struct pathconf_args *);
 int	sys_fpathconf(struct thread *, struct fpathconf_args *);
 int	sys_getrlimit(struct thread *, struct __getrlimit_args *);
 int	sys_setrlimit(struct thread *, struct __setrlimit_args *);
 int	sys_getdirentries(struct thread *, struct getdirentries_args *);
 int	sys___sysctl(struct thread *, struct sysctl_args *);
 int	sys_mlock(struct thread *, struct mlock_args *);
 int	sys_munlock(struct thread *, struct munlock_args *);
 int	sys_undelete(struct thread *, struct undelete_args *);
 int	sys_futimes(struct thread *, struct futimes_args *);
 int	sys_getpgid(struct thread *, struct getpgid_args *);
 int	sys_poll(struct thread *, struct poll_args *);
 int	sys_semget(struct thread *, struct semget_args *);
 int	sys_semop(struct thread *, struct semop_args *);
 int	sys_msgget(struct thread *, struct msgget_args *);
 int	sys_msgsnd(struct thread *, struct msgsnd_args *);
 int	sys_msgrcv(struct thread *, struct msgrcv_args *);
 int	sys_shmat(struct thread *, struct shmat_args *);
 int	sys_shmdt(struct thread *, struct shmdt_args *);
 int	sys_shmget(struct thread *, struct shmget_args *);
 int	sys_clock_gettime(struct thread *, struct clock_gettime_args *);
 int	sys_clock_settime(struct thread *, struct clock_settime_args *);
 int	sys_clock_getres(struct thread *, struct clock_getres_args *);
 int	sys_ktimer_create(struct thread *, struct ktimer_create_args *);
 int	sys_ktimer_delete(struct thread *, struct ktimer_delete_args *);
 int	sys_ktimer_settime(struct thread *, struct ktimer_settime_args *);
 int	sys_ktimer_gettime(struct thread *, struct ktimer_gettime_args *);
 int	sys_ktimer_getoverrun(struct thread *, struct ktimer_getoverrun_args *);
 int	sys_nanosleep(struct thread *, struct nanosleep_args *);
 int	sys_ffclock_getcounter(struct thread *, struct ffclock_getcounter_args *);
 int	sys_ffclock_setestimate(struct thread *, struct ffclock_setestimate_args *);
 int	sys_ffclock_getestimate(struct thread *, struct ffclock_getestimate_args *);
 int	sys_clock_getcpuclockid2(struct thread *, struct clock_getcpuclockid2_args *);
 int	sys_ntp_gettime(struct thread *, struct ntp_gettime_args *);
 int	sys_minherit(struct thread *, struct minherit_args *);
 int	sys_rfork(struct thread *, struct rfork_args *);
 int	sys_openbsd_poll(struct thread *, struct openbsd_poll_args *);
 int	sys_issetugid(struct thread *, struct issetugid_args *);
 int	sys_lchown(struct thread *, struct lchown_args *);
 int	sys_aio_read(struct thread *, struct aio_read_args *);
 int	sys_aio_write(struct thread *, struct aio_write_args *);
 int	sys_lio_listio(struct thread *, struct lio_listio_args *);
 int	sys_getdents(struct thread *, struct getdents_args *);
 int	sys_lchmod(struct thread *, struct lchmod_args *);
 int	sys_lutimes(struct thread *, struct lutimes_args *);
 int	sys_nstat(struct thread *, struct nstat_args *);
 int	sys_nfstat(struct thread *, struct nfstat_args *);
 int	sys_nlstat(struct thread *, struct nlstat_args *);
 int	sys_preadv(struct thread *, struct preadv_args *);
 int	sys_pwritev(struct thread *, struct pwritev_args *);
 int	sys_fhopen(struct thread *, struct fhopen_args *);
 int	sys_fhstat(struct thread *, struct fhstat_args *);
 int	sys_modnext(struct thread *, struct modnext_args *);
 int	sys_modstat(struct thread *, struct modstat_args *);
 int	sys_modfnext(struct thread *, struct modfnext_args *);
 int	sys_modfind(struct thread *, struct modfind_args *);
 int	sys_kldload(struct thread *, struct kldload_args *);
 int	sys_kldunload(struct thread *, struct kldunload_args *);
 int	sys_kldfind(struct thread *, struct kldfind_args *);
 int	sys_kldnext(struct thread *, struct kldnext_args *);
 int	sys_kldstat(struct thread *, struct kldstat_args *);
 int	sys_kldfirstmod(struct thread *, struct kldfirstmod_args *);
 int	sys_getsid(struct thread *, struct getsid_args *);
 int	sys_setresuid(struct thread *, struct setresuid_args *);
 int	sys_setresgid(struct thread *, struct setresgid_args *);
 int	sys_aio_return(struct thread *, struct aio_return_args *);
 int	sys_aio_suspend(struct thread *, struct aio_suspend_args *);
 int	sys_aio_cancel(struct thread *, struct aio_cancel_args *);
 int	sys_aio_error(struct thread *, struct aio_error_args *);
 int	sys_yield(struct thread *, struct yield_args *);
 int	sys_mlockall(struct thread *, struct mlockall_args *);
 int	sys_munlockall(struct thread *, struct munlockall_args *);
 int	sys___getcwd(struct thread *, struct __getcwd_args *);
 int	sys_sched_setparam(struct thread *, struct sched_setparam_args *);
 int	sys_sched_getparam(struct thread *, struct sched_getparam_args *);
 int	sys_sched_setscheduler(struct thread *, struct sched_setscheduler_args *);
 int	sys_sched_getscheduler(struct thread *, struct sched_getscheduler_args *);
 int	sys_sched_yield(struct thread *, struct sched_yield_args *);
 int	sys_sched_get_priority_max(struct thread *, struct sched_get_priority_max_args *);
 int	sys_sched_get_priority_min(struct thread *, struct sched_get_priority_min_args *);
 int	sys_sched_rr_get_interval(struct thread *, struct sched_rr_get_interval_args *);
 int	sys_utrace(struct thread *, struct utrace_args *);
 int	sys_kldsym(struct thread *, struct kldsym_args *);
 int	sys_jail(struct thread *, struct jail_args *);
 int	sys_nnpfs_syscall(struct thread *, struct nnpfs_syscall_args *);
 int	sys_sigprocmask(struct thread *, struct sigprocmask_args *);
 int	sys_sigsuspend(struct thread *, struct sigsuspend_args *);
 int	sys_sigpending(struct thread *, struct sigpending_args *);
 int	sys_sigtimedwait(struct thread *, struct sigtimedwait_args *);
 int	sys_sigwaitinfo(struct thread *, struct sigwaitinfo_args *);
 int	sys___acl_get_file(struct thread *, struct __acl_get_file_args *);
 int	sys___acl_set_file(struct thread *, struct __acl_set_file_args *);
 int	sys___acl_get_fd(struct thread *, struct __acl_get_fd_args *);
 int	sys___acl_set_fd(struct thread *, struct __acl_set_fd_args *);
 int	sys___acl_delete_file(struct thread *, struct __acl_delete_file_args *);
 int	sys___acl_delete_fd(struct thread *, struct __acl_delete_fd_args *);
 int	sys___acl_aclcheck_file(struct thread *, struct __acl_aclcheck_file_args *);
 int	sys___acl_aclcheck_fd(struct thread *, struct __acl_aclcheck_fd_args *);
 int	sys_extattrctl(struct thread *, struct extattrctl_args *);
 int	sys_extattr_set_file(struct thread *, struct extattr_set_file_args *);
 int	sys_extattr_get_file(struct thread *, struct extattr_get_file_args *);
 int	sys_extattr_delete_file(struct thread *, struct extattr_delete_file_args *);
 int	sys_aio_waitcomplete(struct thread *, struct aio_waitcomplete_args *);
 int	sys_getresuid(struct thread *, struct getresuid_args *);
 int	sys_getresgid(struct thread *, struct getresgid_args *);
 int	sys_kqueue(struct thread *, struct kqueue_args *);
 int	sys_kevent(struct thread *, struct kevent_args *);
 int	sys_extattr_set_fd(struct thread *, struct extattr_set_fd_args *);
 int	sys_extattr_get_fd(struct thread *, struct extattr_get_fd_args *);
 int	sys_extattr_delete_fd(struct thread *, struct extattr_delete_fd_args *);
 int	sys___setugid(struct thread *, struct __setugid_args *);
 int	sys_eaccess(struct thread *, struct eaccess_args *);
 int	sys_afs3_syscall(struct thread *, struct afs3_syscall_args *);
 int	sys_nmount(struct thread *, struct nmount_args *);
 int	sys___mac_get_proc(struct thread *, struct __mac_get_proc_args *);
 int	sys___mac_set_proc(struct thread *, struct __mac_set_proc_args *);
 int	sys___mac_get_fd(struct thread *, struct __mac_get_fd_args *);
 int	sys___mac_get_file(struct thread *, struct __mac_get_file_args *);
 int	sys___mac_set_fd(struct thread *, struct __mac_set_fd_args *);
 int	sys___mac_set_file(struct thread *, struct __mac_set_file_args *);
 int	sys_kenv(struct thread *, struct kenv_args *);
 int	sys_lchflags(struct thread *, struct lchflags_args *);
 int	sys_uuidgen(struct thread *, struct uuidgen_args *);
 int	sys_sendfile(struct thread *, struct sendfile_args *);
 int	sys_mac_syscall(struct thread *, struct mac_syscall_args *);
 int	sys_getfsstat(struct thread *, struct getfsstat_args *);
 int	sys_statfs(struct thread *, struct statfs_args *);
 int	sys_fstatfs(struct thread *, struct fstatfs_args *);
 int	sys_fhstatfs(struct thread *, struct fhstatfs_args *);
 int	sys_ksem_close(struct thread *, struct ksem_close_args *);
 int	sys_ksem_post(struct thread *, struct ksem_post_args *);
 int	sys_ksem_wait(struct thread *, struct ksem_wait_args *);
 int	sys_ksem_trywait(struct thread *, struct ksem_trywait_args *);
 int	sys_ksem_init(struct thread *, struct ksem_init_args *);
 int	sys_ksem_open(struct thread *, struct ksem_open_args *);
 int	sys_ksem_unlink(struct thread *, struct ksem_unlink_args *);
 int	sys_ksem_getvalue(struct thread *, struct ksem_getvalue_args *);
 int	sys_ksem_destroy(struct thread *, struct ksem_destroy_args *);
 int	sys___mac_get_pid(struct thread *, struct __mac_get_pid_args *);
 int	sys___mac_get_link(struct thread *, struct __mac_get_link_args *);
 int	sys___mac_set_link(struct thread *, struct __mac_set_link_args *);
 int	sys_extattr_set_link(struct thread *, struct extattr_set_link_args *);
 int	sys_extattr_get_link(struct thread *, struct extattr_get_link_args *);
 int	sys_extattr_delete_link(struct thread *, struct extattr_delete_link_args *);
 int	sys___mac_execve(struct thread *, struct __mac_execve_args *);
 int	sys_sigaction(struct thread *, struct sigaction_args *);
 int	sys_sigreturn(struct thread *, struct sigreturn_args *);
 int	sys_getcontext(struct thread *, struct getcontext_args *);
 int	sys_setcontext(struct thread *, struct setcontext_args *);
 int	sys_swapcontext(struct thread *, struct swapcontext_args *);
 int	sys_swapoff(struct thread *, struct swapoff_args *);
 int	sys___acl_get_link(struct thread *, struct __acl_get_link_args *);
 int	sys___acl_set_link(struct thread *, struct __acl_set_link_args *);
 int	sys___acl_delete_link(struct thread *, struct __acl_delete_link_args *);
 int	sys___acl_aclcheck_link(struct thread *, struct __acl_aclcheck_link_args *);
 int	sys_sigwait(struct thread *, struct sigwait_args *);
 int	sys_thr_create(struct thread *, struct thr_create_args *);
 int	sys_thr_exit(struct thread *, struct thr_exit_args *);
 int	sys_thr_self(struct thread *, struct thr_self_args *);
 int	sys_thr_kill(struct thread *, struct thr_kill_args *);
 int	sys_jail_attach(struct thread *, struct jail_attach_args *);
 int	sys_extattr_list_fd(struct thread *, struct extattr_list_fd_args *);
 int	sys_extattr_list_file(struct thread *, struct extattr_list_file_args *);
 int	sys_extattr_list_link(struct thread *, struct extattr_list_link_args *);
 int	sys_ksem_timedwait(struct thread *, struct ksem_timedwait_args *);
 int	sys_thr_suspend(struct thread *, struct thr_suspend_args *);
 int	sys_thr_wake(struct thread *, struct thr_wake_args *);
 int	sys_kldunloadf(struct thread *, struct kldunloadf_args *);
 int	sys_audit(struct thread *, struct audit_args *);
 int	sys_auditon(struct thread *, struct auditon_args *);
 int	sys_getauid(struct thread *, struct getauid_args *);
 int	sys_setauid(struct thread *, struct setauid_args *);
 int	sys_getaudit(struct thread *, struct getaudit_args *);
 int	sys_setaudit(struct thread *, struct setaudit_args *);
 int	sys_getaudit_addr(struct thread *, struct getaudit_addr_args *);
 int	sys_setaudit_addr(struct thread *, struct setaudit_addr_args *);
 int	sys_auditctl(struct thread *, struct auditctl_args *);
 int	sys__umtx_op(struct thread *, struct _umtx_op_args *);
 int	sys_thr_new(struct thread *, struct thr_new_args *);
 int	sys_sigqueue(struct thread *, struct sigqueue_args *);
 int	sys_kmq_open(struct thread *, struct kmq_open_args *);
 int	sys_kmq_setattr(struct thread *, struct kmq_setattr_args *);
 int	sys_kmq_timedreceive(struct thread *, struct kmq_timedreceive_args *);
 int	sys_kmq_timedsend(struct thread *, struct kmq_timedsend_args *);
 int	sys_kmq_notify(struct thread *, struct kmq_notify_args *);
 int	sys_kmq_unlink(struct thread *, struct kmq_unlink_args *);
 int	sys_abort2(struct thread *, struct abort2_args *);
 int	sys_thr_set_name(struct thread *, struct thr_set_name_args *);
 int	sys_aio_fsync(struct thread *, struct aio_fsync_args *);
 int	sys_rtprio_thread(struct thread *, struct rtprio_thread_args *);
 int	sys_sctp_peeloff(struct thread *, struct sctp_peeloff_args *);
 int	sys_sctp_generic_sendmsg(struct thread *, struct sctp_generic_sendmsg_args *);
 int	sys_sctp_generic_sendmsg_iov(struct thread *, struct sctp_generic_sendmsg_iov_args *);
 int	sys_sctp_generic_recvmsg(struct thread *, struct sctp_generic_recvmsg_args *);
 int	sys_pread(struct thread *, struct pread_args *);
 int	sys_pwrite(struct thread *, struct pwrite_args *);
 int	sys_mmap(struct thread *, struct mmap_args *);
 int	sys_lseek(struct thread *, struct lseek_args *);
 int	sys_truncate(struct thread *, struct truncate_args *);
 int	sys_ftruncate(struct thread *, struct ftruncate_args *);
 int	sys_thr_kill2(struct thread *, struct thr_kill2_args *);
 int	sys_shm_open(struct thread *, struct shm_open_args *);
 int	sys_shm_unlink(struct thread *, struct shm_unlink_args *);
 int	sys_cpuset(struct thread *, struct cpuset_args *);
 int	sys_cpuset_setid(struct thread *, struct cpuset_setid_args *);
 int	sys_cpuset_getid(struct thread *, struct cpuset_getid_args *);
 int	sys_cpuset_getaffinity(struct thread *, struct cpuset_getaffinity_args *);
 int	sys_cpuset_setaffinity(struct thread *, struct cpuset_setaffinity_args *);
 int	sys_faccessat(struct thread *, struct faccessat_args *);
 int	sys_fchmodat(struct thread *, struct fchmodat_args *);
 int	sys_fchownat(struct thread *, struct fchownat_args *);
 int	sys_fexecve(struct thread *, struct fexecve_args *);
 int	sys_fstatat(struct thread *, struct fstatat_args *);
 int	sys_futimesat(struct thread *, struct futimesat_args *);
 int	sys_linkat(struct thread *, struct linkat_args *);
 int	sys_mkdirat(struct thread *, struct mkdirat_args *);
 int	sys_mkfifoat(struct thread *, struct mkfifoat_args *);
 int	sys_mknodat(struct thread *, struct mknodat_args *);
 int	sys_openat(struct thread *, struct openat_args *);
 int	sys_readlinkat(struct thread *, struct readlinkat_args *);
 int	sys_renameat(struct thread *, struct renameat_args *);
 int	sys_symlinkat(struct thread *, struct symlinkat_args *);
 int	sys_unlinkat(struct thread *, struct unlinkat_args *);
 int	sys_posix_openpt(struct thread *, struct posix_openpt_args *);
 int	sys_gssd_syscall(struct thread *, struct gssd_syscall_args *);
 int	sys_jail_get(struct thread *, struct jail_get_args *);
 int	sys_jail_set(struct thread *, struct jail_set_args *);
 int	sys_jail_remove(struct thread *, struct jail_remove_args *);
 int	sys_closefrom(struct thread *, struct closefrom_args *);
 int	sys___semctl(struct thread *, struct __semctl_args *);
 int	sys_msgctl(struct thread *, struct msgctl_args *);
 int	sys_shmctl(struct thread *, struct shmctl_args *);
 int	sys_lpathconf(struct thread *, struct lpathconf_args *);
 int	sys___cap_rights_get(struct thread *, struct __cap_rights_get_args *);
 int	sys_cap_enter(struct thread *, struct cap_enter_args *);
 int	sys_cap_getmode(struct thread *, struct cap_getmode_args *);
 int	sys_pdfork(struct thread *, struct pdfork_args *);
 int	sys_pdkill(struct thread *, struct pdkill_args *);
 int	sys_pdgetpid(struct thread *, struct pdgetpid_args *);
 int	sys_pselect(struct thread *, struct pselect_args *);
 int	sys_getloginclass(struct thread *, struct getloginclass_args *);
 int	sys_setloginclass(struct thread *, struct setloginclass_args *);
 int	sys_rctl_get_racct(struct thread *, struct rctl_get_racct_args *);
 int	sys_rctl_get_rules(struct thread *, struct rctl_get_rules_args *);
 int	sys_rctl_get_limits(struct thread *, struct rctl_get_limits_args *);
 int	sys_rctl_add_rule(struct thread *, struct rctl_add_rule_args *);
 int	sys_rctl_remove_rule(struct thread *, struct rctl_remove_rule_args *);
 int	sys_posix_fallocate(struct thread *, struct posix_fallocate_args *);
 int	sys_posix_fadvise(struct thread *, struct posix_fadvise_args *);
 int	sys_wait6(struct thread *, struct wait6_args *);
 int	sys_cap_rights_limit(struct thread *, struct cap_rights_limit_args *);
 int	sys_cap_ioctls_limit(struct thread *, struct cap_ioctls_limit_args *);
 int	sys_cap_ioctls_get(struct thread *, struct cap_ioctls_get_args *);
 int	sys_cap_fcntls_limit(struct thread *, struct cap_fcntls_limit_args *);
 int	sys_cap_fcntls_get(struct thread *, struct cap_fcntls_get_args *);
 int	sys_bindat(struct thread *, struct bindat_args *);
 int	sys_connectat(struct thread *, struct connectat_args *);
 int	sys_chflagsat(struct thread *, struct chflagsat_args *);
 int	sys_accept4(struct thread *, struct accept4_args *);
 int	sys_pipe2(struct thread *, struct pipe2_args *);
 int	sys_aio_mlock(struct thread *, struct aio_mlock_args *);
 int	sys_procctl(struct thread *, struct procctl_args *);
 int	sys_ppoll(struct thread *, struct ppoll_args *);
 int	sys_futimens(struct thread *, struct futimens_args *);
 int	sys_utimensat(struct thread *, struct utimensat_args *);
 int	sys_numa_getaffinity(struct thread *, struct numa_getaffinity_args *);
 int	sys_numa_setaffinity(struct thread *, struct numa_setaffinity_args *);
 
 #ifdef COMPAT_43
 
 struct ocreat_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char mode_l_[PADL_(int)]; int mode; char mode_r_[PADR_(int)];
 };
 struct olseek_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char offset_l_[PADL_(long)]; long offset; char offset_r_[PADR_(long)];
 	char whence_l_[PADL_(int)]; int whence; char whence_r_[PADR_(int)];
 };
 struct ostat_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char ub_l_[PADL_(struct ostat *)]; struct ostat * ub; char ub_r_[PADR_(struct ostat *)];
 };
 struct olstat_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char ub_l_[PADL_(struct ostat *)]; struct ostat * ub; char ub_r_[PADR_(struct ostat *)];
 };
 struct osigaction_args {
 	char signum_l_[PADL_(int)]; int signum; char signum_r_[PADR_(int)];
 	char nsa_l_[PADL_(struct osigaction *)]; struct osigaction * nsa; char nsa_r_[PADR_(struct osigaction *)];
 	char osa_l_[PADL_(struct osigaction *)]; struct osigaction * osa; char osa_r_[PADR_(struct osigaction *)];
 };
 struct osigprocmask_args {
 	char how_l_[PADL_(int)]; int how; char how_r_[PADR_(int)];
 	char mask_l_[PADL_(osigset_t)]; osigset_t mask; char mask_r_[PADR_(osigset_t)];
 };
 struct ofstat_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char sb_l_[PADL_(struct ostat *)]; struct ostat * sb; char sb_r_[PADR_(struct ostat *)];
 };
 struct getkerninfo_args {
 	char op_l_[PADL_(int)]; int op; char op_r_[PADR_(int)];
 	char where_l_[PADL_(char *)]; char * where; char where_r_[PADR_(char *)];
 	char size_l_[PADL_(size_t *)]; size_t * size; char size_r_[PADR_(size_t *)];
 	char arg_l_[PADL_(int)]; int arg; char arg_r_[PADR_(int)];
 };
 struct ommap_args {
 	char addr_l_[PADL_(void *)]; void * addr; char addr_r_[PADR_(void *)];
 	char len_l_[PADL_(int)]; int len; char len_r_[PADR_(int)];
 	char prot_l_[PADL_(int)]; int prot; char prot_r_[PADR_(int)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char pos_l_[PADL_(long)]; long pos; char pos_r_[PADR_(long)];
 };
 struct gethostname_args {
 	char hostname_l_[PADL_(char *)]; char * hostname; char hostname_r_[PADR_(char *)];
 	char len_l_[PADL_(u_int)]; u_int len; char len_r_[PADR_(u_int)];
 };
 struct sethostname_args {
 	char hostname_l_[PADL_(char *)]; char * hostname; char hostname_r_[PADR_(char *)];
 	char len_l_[PADL_(u_int)]; u_int len; char len_r_[PADR_(u_int)];
 };
 struct osend_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char buf_l_[PADL_(caddr_t)]; caddr_t buf; char buf_r_[PADR_(caddr_t)];
 	char len_l_[PADL_(int)]; int len; char len_r_[PADR_(int)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct orecv_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char buf_l_[PADL_(caddr_t)]; caddr_t buf; char buf_r_[PADR_(caddr_t)];
 	char len_l_[PADL_(int)]; int len; char len_r_[PADR_(int)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct osigreturn_args {
 	char sigcntxp_l_[PADL_(struct osigcontext *)]; struct osigcontext * sigcntxp; char sigcntxp_r_[PADR_(struct osigcontext *)];
 };
 struct osigvec_args {
 	char signum_l_[PADL_(int)]; int signum; char signum_r_[PADR_(int)];
 	char nsv_l_[PADL_(struct sigvec *)]; struct sigvec * nsv; char nsv_r_[PADR_(struct sigvec *)];
 	char osv_l_[PADL_(struct sigvec *)]; struct sigvec * osv; char osv_r_[PADR_(struct sigvec *)];
 };
 struct osigblock_args {
 	char mask_l_[PADL_(int)]; int mask; char mask_r_[PADR_(int)];
 };
 struct osigsetmask_args {
 	char mask_l_[PADL_(int)]; int mask; char mask_r_[PADR_(int)];
 };
 struct osigsuspend_args {
 	char mask_l_[PADL_(osigset_t)]; osigset_t mask; char mask_r_[PADR_(osigset_t)];
 };
 struct osigstack_args {
 	char nss_l_[PADL_(struct sigstack *)]; struct sigstack * nss; char nss_r_[PADR_(struct sigstack *)];
 	char oss_l_[PADL_(struct sigstack *)]; struct sigstack * oss; char oss_r_[PADR_(struct sigstack *)];
 };
 struct orecvmsg_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char msg_l_[PADL_(struct omsghdr *)]; struct omsghdr * msg; char msg_r_[PADR_(struct omsghdr *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct osendmsg_args {
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char msg_l_[PADL_(caddr_t)]; caddr_t msg; char msg_r_[PADR_(caddr_t)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct otruncate_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char length_l_[PADL_(long)]; long length; char length_r_[PADR_(long)];
 };
 struct oftruncate_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char length_l_[PADL_(long)]; long length; char length_r_[PADR_(long)];
 };
 struct ogetpeername_args {
 	char fdes_l_[PADL_(int)]; int fdes; char fdes_r_[PADR_(int)];
 	char asa_l_[PADL_(caddr_t)]; caddr_t asa; char asa_r_[PADR_(caddr_t)];
 	char alen_l_[PADL_(int *)]; int * alen; char alen_r_[PADR_(int *)];
 };
 struct osethostid_args {
 	char hostid_l_[PADL_(long)]; long hostid; char hostid_r_[PADR_(long)];
 };
 struct ogetrlimit_args {
 	char which_l_[PADL_(u_int)]; u_int which; char which_r_[PADR_(u_int)];
 	char rlp_l_[PADL_(struct orlimit *)]; struct orlimit * rlp; char rlp_r_[PADR_(struct orlimit *)];
 };
 struct osetrlimit_args {
 	char which_l_[PADL_(u_int)]; u_int which; char which_r_[PADR_(u_int)];
 	char rlp_l_[PADL_(struct orlimit *)]; struct orlimit * rlp; char rlp_r_[PADR_(struct orlimit *)];
 };
 struct okillpg_args {
 	char pgid_l_[PADL_(int)]; int pgid; char pgid_r_[PADR_(int)];
 	char signum_l_[PADL_(int)]; int signum; char signum_r_[PADR_(int)];
 };
 struct ogetdirentries_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char buf_l_[PADL_(char *)]; char * buf; char buf_r_[PADR_(char *)];
 	char count_l_[PADL_(u_int)]; u_int count; char count_r_[PADR_(u_int)];
 	char basep_l_[PADL_(long *)]; long * basep; char basep_r_[PADR_(long *)];
 };
 int	ocreat(struct thread *, struct ocreat_args *);
 int	olseek(struct thread *, struct olseek_args *);
 int	ostat(struct thread *, struct ostat_args *);
 int	olstat(struct thread *, struct olstat_args *);
 int	osigaction(struct thread *, struct osigaction_args *);
 int	osigprocmask(struct thread *, struct osigprocmask_args *);
 int	osigpending(struct thread *, struct osigpending_args *);
 int	ofstat(struct thread *, struct ofstat_args *);
 int	ogetkerninfo(struct thread *, struct getkerninfo_args *);
 int	ogetpagesize(struct thread *, struct getpagesize_args *);
 int	ommap(struct thread *, struct ommap_args *);
 int	owait(struct thread *, struct owait_args *);
 int	ogethostname(struct thread *, struct gethostname_args *);
 int	osethostname(struct thread *, struct sethostname_args *);
 int	oaccept(struct thread *, struct accept_args *);
 int	osend(struct thread *, struct osend_args *);
 int	orecv(struct thread *, struct orecv_args *);
 int	osigreturn(struct thread *, struct osigreturn_args *);
 int	osigvec(struct thread *, struct osigvec_args *);
 int	osigblock(struct thread *, struct osigblock_args *);
 int	osigsetmask(struct thread *, struct osigsetmask_args *);
 int	osigsuspend(struct thread *, struct osigsuspend_args *);
 int	osigstack(struct thread *, struct osigstack_args *);
 int	orecvmsg(struct thread *, struct orecvmsg_args *);
 int	osendmsg(struct thread *, struct osendmsg_args *);
 int	orecvfrom(struct thread *, struct recvfrom_args *);
 int	otruncate(struct thread *, struct otruncate_args *);
 int	oftruncate(struct thread *, struct oftruncate_args *);
 int	ogetpeername(struct thread *, struct ogetpeername_args *);
 int	ogethostid(struct thread *, struct ogethostid_args *);
 int	osethostid(struct thread *, struct osethostid_args *);
 int	ogetrlimit(struct thread *, struct ogetrlimit_args *);
 int	osetrlimit(struct thread *, struct osetrlimit_args *);
 int	okillpg(struct thread *, struct okillpg_args *);
 int	oquota(struct thread *, struct oquota_args *);
 int	ogetsockname(struct thread *, struct getsockname_args *);
 int	ogetdirentries(struct thread *, struct ogetdirentries_args *);
 
 #endif /* COMPAT_43 */
 
 
 #ifdef COMPAT_FREEBSD4
 
 struct freebsd4_getfsstat_args {
 	char buf_l_[PADL_(struct ostatfs *)]; struct ostatfs * buf; char buf_r_[PADR_(struct ostatfs *)];
 	char bufsize_l_[PADL_(long)]; long bufsize; char bufsize_r_[PADR_(long)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct freebsd4_statfs_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char buf_l_[PADL_(struct ostatfs *)]; struct ostatfs * buf; char buf_r_[PADR_(struct ostatfs *)];
 };
 struct freebsd4_fstatfs_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char buf_l_[PADL_(struct ostatfs *)]; struct ostatfs * buf; char buf_r_[PADR_(struct ostatfs *)];
 };
 struct freebsd4_getdomainname_args {
 	char domainname_l_[PADL_(char *)]; char * domainname; char domainname_r_[PADR_(char *)];
 	char len_l_[PADL_(int)]; int len; char len_r_[PADR_(int)];
 };
 struct freebsd4_setdomainname_args {
 	char domainname_l_[PADL_(char *)]; char * domainname; char domainname_r_[PADR_(char *)];
 	char len_l_[PADL_(int)]; int len; char len_r_[PADR_(int)];
 };
 struct freebsd4_uname_args {
 	char name_l_[PADL_(struct utsname *)]; struct utsname * name; char name_r_[PADR_(struct utsname *)];
 };
 struct freebsd4_fhstatfs_args {
 	char u_fhp_l_[PADL_(const struct fhandle *)]; const struct fhandle * u_fhp; char u_fhp_r_[PADR_(const struct fhandle *)];
 	char buf_l_[PADL_(struct ostatfs *)]; struct ostatfs * buf; char buf_r_[PADR_(struct ostatfs *)];
 };
 struct freebsd4_sendfile_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char s_l_[PADL_(int)]; int s; char s_r_[PADR_(int)];
 	char offset_l_[PADL_(off_t)]; off_t offset; char offset_r_[PADR_(off_t)];
 	char nbytes_l_[PADL_(size_t)]; size_t nbytes; char nbytes_r_[PADR_(size_t)];
 	char hdtr_l_[PADL_(struct sf_hdtr *)]; struct sf_hdtr * hdtr; char hdtr_r_[PADR_(struct sf_hdtr *)];
 	char sbytes_l_[PADL_(off_t *)]; off_t * sbytes; char sbytes_r_[PADR_(off_t *)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 };
 struct freebsd4_sigaction_args {
 	char sig_l_[PADL_(int)]; int sig; char sig_r_[PADR_(int)];
 	char act_l_[PADL_(const struct sigaction *)]; const struct sigaction * act; char act_r_[PADR_(const struct sigaction *)];
 	char oact_l_[PADL_(struct sigaction *)]; struct sigaction * oact; char oact_r_[PADR_(struct sigaction *)];
 };
 struct freebsd4_sigreturn_args {
 	char sigcntxp_l_[PADL_(const struct ucontext4 *)]; const struct ucontext4 * sigcntxp; char sigcntxp_r_[PADR_(const struct ucontext4 *)];
 };
 int	freebsd4_getfsstat(struct thread *, struct freebsd4_getfsstat_args *);
 int	freebsd4_statfs(struct thread *, struct freebsd4_statfs_args *);
 int	freebsd4_fstatfs(struct thread *, struct freebsd4_fstatfs_args *);
 int	freebsd4_getdomainname(struct thread *, struct freebsd4_getdomainname_args *);
 int	freebsd4_setdomainname(struct thread *, struct freebsd4_setdomainname_args *);
 int	freebsd4_uname(struct thread *, struct freebsd4_uname_args *);
 int	freebsd4_fhstatfs(struct thread *, struct freebsd4_fhstatfs_args *);
 int	freebsd4_sendfile(struct thread *, struct freebsd4_sendfile_args *);
 int	freebsd4_sigaction(struct thread *, struct freebsd4_sigaction_args *);
 int	freebsd4_sigreturn(struct thread *, struct freebsd4_sigreturn_args *);
 
 #endif /* COMPAT_FREEBSD4 */
 
 
 #ifdef COMPAT_FREEBSD6
 
 struct freebsd6_pread_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char buf_l_[PADL_(void *)]; void * buf; char buf_r_[PADR_(void *)];
 	char nbyte_l_[PADL_(size_t)]; size_t nbyte; char nbyte_r_[PADR_(size_t)];
 	char pad_l_[PADL_(int)]; int pad; char pad_r_[PADR_(int)];
 	char offset_l_[PADL_(off_t)]; off_t offset; char offset_r_[PADR_(off_t)];
 };
 struct freebsd6_pwrite_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char buf_l_[PADL_(const void *)]; const void * buf; char buf_r_[PADR_(const void *)];
 	char nbyte_l_[PADL_(size_t)]; size_t nbyte; char nbyte_r_[PADR_(size_t)];
 	char pad_l_[PADL_(int)]; int pad; char pad_r_[PADR_(int)];
 	char offset_l_[PADL_(off_t)]; off_t offset; char offset_r_[PADR_(off_t)];
 };
 struct freebsd6_mmap_args {
 	char addr_l_[PADL_(caddr_t)]; caddr_t addr; char addr_r_[PADR_(caddr_t)];
 	char len_l_[PADL_(size_t)]; size_t len; char len_r_[PADR_(size_t)];
 	char prot_l_[PADL_(int)]; int prot; char prot_r_[PADR_(int)];
 	char flags_l_[PADL_(int)]; int flags; char flags_r_[PADR_(int)];
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char pad_l_[PADL_(int)]; int pad; char pad_r_[PADR_(int)];
 	char pos_l_[PADL_(off_t)]; off_t pos; char pos_r_[PADR_(off_t)];
 };
 struct freebsd6_lseek_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char pad_l_[PADL_(int)]; int pad; char pad_r_[PADR_(int)];
 	char offset_l_[PADL_(off_t)]; off_t offset; char offset_r_[PADR_(off_t)];
 	char whence_l_[PADL_(int)]; int whence; char whence_r_[PADR_(int)];
 };
 struct freebsd6_truncate_args {
 	char path_l_[PADL_(char *)]; char * path; char path_r_[PADR_(char *)];
 	char pad_l_[PADL_(int)]; int pad; char pad_r_[PADR_(int)];
 	char length_l_[PADL_(off_t)]; off_t length; char length_r_[PADR_(off_t)];
 };
 struct freebsd6_ftruncate_args {
 	char fd_l_[PADL_(int)]; int fd; char fd_r_[PADR_(int)];
 	char pad_l_[PADL_(int)]; int pad; char pad_r_[PADR_(int)];
 	char length_l_[PADL_(off_t)]; off_t length; char length_r_[PADR_(off_t)];
 };
 struct freebsd6_aio_read_args {
 	char aiocbp_l_[PADL_(struct oaiocb *)]; struct oaiocb * aiocbp; char aiocbp_r_[PADR_(struct oaiocb *)];
 };
 struct freebsd6_aio_write_args {
 	char aiocbp_l_[PADL_(struct oaiocb *)]; struct oaiocb * aiocbp; char aiocbp_r_[PADR_(struct oaiocb *)];
 };
 struct freebsd6_lio_listio_args {
 	char mode_l_[PADL_(int)]; int mode; char mode_r_[PADR_(int)];
 	char acb_list_l_[PADL_(struct oaiocb *const *)]; struct oaiocb *const * acb_list; char acb_list_r_[PADR_(struct oaiocb *const *)];
 	char nent_l_[PADL_(int)]; int nent; char nent_r_[PADR_(int)];
 	char sig_l_[PADL_(struct osigevent *)]; struct osigevent * sig; char sig_r_[PADR_(struct osigevent *)];
 };
 int	freebsd6_pread(struct thread *, struct freebsd6_pread_args *);
 int	freebsd6_pwrite(struct thread *, struct freebsd6_pwrite_args *);
 int	freebsd6_mmap(struct thread *, struct freebsd6_mmap_args *);
 int	freebsd6_lseek(struct thread *, struct freebsd6_lseek_args *);
 int	freebsd6_truncate(struct thread *, struct freebsd6_truncate_args *);
 int	freebsd6_ftruncate(struct thread *, struct freebsd6_ftruncate_args *);
 int	freebsd6_aio_read(struct thread *, struct freebsd6_aio_read_args *);
 int	freebsd6_aio_write(struct thread *, struct freebsd6_aio_write_args *);
 int	freebsd6_lio_listio(struct thread *, struct freebsd6_lio_listio_args *);
 
 #endif /* COMPAT_FREEBSD6 */
 
 
 #ifdef COMPAT_FREEBSD7
 
 struct freebsd7___semctl_args {
 	char semid_l_[PADL_(int)]; int semid; char semid_r_[PADR_(int)];
 	char semnum_l_[PADL_(int)]; int semnum; char semnum_r_[PADR_(int)];
 	char cmd_l_[PADL_(int)]; int cmd; char cmd_r_[PADR_(int)];
 	char arg_l_[PADL_(union semun_old *)]; union semun_old * arg; char arg_r_[PADR_(union semun_old *)];
 };
 struct freebsd7_msgctl_args {
 	char msqid_l_[PADL_(int)]; int msqid; char msqid_r_[PADR_(int)];
 	char cmd_l_[PADL_(int)]; int cmd; char cmd_r_[PADR_(int)];
 	char buf_l_[PADL_(struct msqid_ds_old *)]; struct msqid_ds_old * buf; char buf_r_[PADR_(struct msqid_ds_old *)];
 };
 struct freebsd7_shmctl_args {
 	char shmid_l_[PADL_(int)]; int shmid; char shmid_r_[PADR_(int)];
 	char cmd_l_[PADL_(int)]; int cmd; char cmd_r_[PADR_(int)];
 	char buf_l_[PADL_(struct shmid_ds_old *)]; struct shmid_ds_old * buf; char buf_r_[PADR_(struct shmid_ds_old *)];
 };
 int	freebsd7___semctl(struct thread *, struct freebsd7___semctl_args *);
 int	freebsd7_msgctl(struct thread *, struct freebsd7_msgctl_args *);
 int	freebsd7_shmctl(struct thread *, struct freebsd7_shmctl_args *);
 
 #endif /* COMPAT_FREEBSD7 */
 
 
 #ifdef COMPAT_FREEBSD10
 
 int	freebsd10_pipe(struct thread *, struct freebsd10_pipe_args *);
 
 #endif /* COMPAT_FREEBSD10 */
 
 #define	SYS_AUE_syscall	AUE_NULL
 #define	SYS_AUE_exit	AUE_EXIT
 #define	SYS_AUE_fork	AUE_FORK
 #define	SYS_AUE_read	AUE_READ
 #define	SYS_AUE_write	AUE_WRITE
 #define	SYS_AUE_open	AUE_OPEN_RWTC
 #define	SYS_AUE_close	AUE_CLOSE
 #define	SYS_AUE_wait4	AUE_WAIT4
 #define	SYS_AUE_ocreat	AUE_CREAT
 #define	SYS_AUE_link	AUE_LINK
 #define	SYS_AUE_unlink	AUE_UNLINK
 #define	SYS_AUE_chdir	AUE_CHDIR
 #define	SYS_AUE_fchdir	AUE_FCHDIR
 #define	SYS_AUE_mknod	AUE_MKNOD
 #define	SYS_AUE_chmod	AUE_CHMOD
 #define	SYS_AUE_chown	AUE_CHOWN
 #define	SYS_AUE_break	AUE_NULL
 #define	SYS_AUE_freebsd4_getfsstat	AUE_GETFSSTAT
 #define	SYS_AUE_olseek	AUE_LSEEK
 #define	SYS_AUE_getpid	AUE_GETPID
 #define	SYS_AUE_mount	AUE_MOUNT
 #define	SYS_AUE_unmount	AUE_UMOUNT
 #define	SYS_AUE_setuid	AUE_SETUID
 #define	SYS_AUE_getuid	AUE_GETUID
 #define	SYS_AUE_geteuid	AUE_GETEUID
 #define	SYS_AUE_ptrace	AUE_PTRACE
 #define	SYS_AUE_recvmsg	AUE_RECVMSG
 #define	SYS_AUE_sendmsg	AUE_SENDMSG
 #define	SYS_AUE_recvfrom	AUE_RECVFROM
 #define	SYS_AUE_accept	AUE_ACCEPT
 #define	SYS_AUE_getpeername	AUE_GETPEERNAME
 #define	SYS_AUE_getsockname	AUE_GETSOCKNAME
 #define	SYS_AUE_access	AUE_ACCESS
 #define	SYS_AUE_chflags	AUE_CHFLAGS
 #define	SYS_AUE_fchflags	AUE_FCHFLAGS
 #define	SYS_AUE_sync	AUE_SYNC
 #define	SYS_AUE_kill	AUE_KILL
 #define	SYS_AUE_ostat	AUE_STAT
 #define	SYS_AUE_getppid	AUE_GETPPID
 #define	SYS_AUE_olstat	AUE_LSTAT
 #define	SYS_AUE_dup	AUE_DUP
 #define	SYS_AUE_freebsd10_pipe	AUE_PIPE
 #define	SYS_AUE_getegid	AUE_GETEGID
 #define	SYS_AUE_profil	AUE_PROFILE
 #define	SYS_AUE_ktrace	AUE_KTRACE
 #define	SYS_AUE_osigaction	AUE_SIGACTION
 #define	SYS_AUE_getgid	AUE_GETGID
 #define	SYS_AUE_osigprocmask	AUE_SIGPROCMASK
 #define	SYS_AUE_getlogin	AUE_GETLOGIN
 #define	SYS_AUE_setlogin	AUE_SETLOGIN
 #define	SYS_AUE_acct	AUE_ACCT
 #define	SYS_AUE_osigpending	AUE_SIGPENDING
 #define	SYS_AUE_sigaltstack	AUE_SIGALTSTACK
 #define	SYS_AUE_ioctl	AUE_IOCTL
 #define	SYS_AUE_reboot	AUE_REBOOT
 #define	SYS_AUE_revoke	AUE_REVOKE
 #define	SYS_AUE_symlink	AUE_SYMLINK
 #define	SYS_AUE_readlink	AUE_READLINK
 #define	SYS_AUE_execve	AUE_EXECVE
 #define	SYS_AUE_umask	AUE_UMASK
 #define	SYS_AUE_chroot	AUE_CHROOT
 #define	SYS_AUE_ofstat	AUE_FSTAT
 #define	SYS_AUE_ogetkerninfo	AUE_NULL
 #define	SYS_AUE_ogetpagesize	AUE_NULL
 #define	SYS_AUE_msync	AUE_MSYNC
 #define	SYS_AUE_vfork	AUE_VFORK
 #define	SYS_AUE_sbrk	AUE_SBRK
 #define	SYS_AUE_sstk	AUE_SSTK
 #define	SYS_AUE_ommap	AUE_MMAP
 #define	SYS_AUE_vadvise	AUE_O_VADVISE
 #define	SYS_AUE_munmap	AUE_MUNMAP
 #define	SYS_AUE_mprotect	AUE_MPROTECT
 #define	SYS_AUE_madvise	AUE_MADVISE
 #define	SYS_AUE_mincore	AUE_MINCORE
 #define	SYS_AUE_getgroups	AUE_GETGROUPS
 #define	SYS_AUE_setgroups	AUE_SETGROUPS
 #define	SYS_AUE_getpgrp	AUE_GETPGRP
 #define	SYS_AUE_setpgid	AUE_SETPGRP
 #define	SYS_AUE_setitimer	AUE_SETITIMER
 #define	SYS_AUE_owait	AUE_WAIT4
 #define	SYS_AUE_swapon	AUE_SWAPON
 #define	SYS_AUE_getitimer	AUE_GETITIMER
 #define	SYS_AUE_ogethostname	AUE_SYSCTL
 #define	SYS_AUE_osethostname	AUE_SYSCTL
 #define	SYS_AUE_getdtablesize	AUE_GETDTABLESIZE
 #define	SYS_AUE_dup2	AUE_DUP2
 #define	SYS_AUE_fcntl	AUE_FCNTL
 #define	SYS_AUE_select	AUE_SELECT
 #define	SYS_AUE_fsync	AUE_FSYNC
 #define	SYS_AUE_setpriority	AUE_SETPRIORITY
 #define	SYS_AUE_socket	AUE_SOCKET
 #define	SYS_AUE_connect	AUE_CONNECT
 #define	SYS_AUE_oaccept	AUE_ACCEPT
 #define	SYS_AUE_getpriority	AUE_GETPRIORITY
 #define	SYS_AUE_osend	AUE_SEND
 #define	SYS_AUE_orecv	AUE_RECV
 #define	SYS_AUE_osigreturn	AUE_SIGRETURN
 #define	SYS_AUE_bind	AUE_BIND
 #define	SYS_AUE_setsockopt	AUE_SETSOCKOPT
 #define	SYS_AUE_listen	AUE_LISTEN
 #define	SYS_AUE_osigvec	AUE_NULL
 #define	SYS_AUE_osigblock	AUE_NULL
 #define	SYS_AUE_osigsetmask	AUE_NULL
 #define	SYS_AUE_osigsuspend	AUE_NULL
 #define	SYS_AUE_osigstack	AUE_NULL
 #define	SYS_AUE_orecvmsg	AUE_RECVMSG
 #define	SYS_AUE_osendmsg	AUE_SENDMSG
 #define	SYS_AUE_gettimeofday	AUE_GETTIMEOFDAY
 #define	SYS_AUE_getrusage	AUE_GETRUSAGE
 #define	SYS_AUE_getsockopt	AUE_GETSOCKOPT
 #define	SYS_AUE_readv	AUE_READV
 #define	SYS_AUE_writev	AUE_WRITEV
 #define	SYS_AUE_settimeofday	AUE_SETTIMEOFDAY
 #define	SYS_AUE_fchown	AUE_FCHOWN
 #define	SYS_AUE_fchmod	AUE_FCHMOD
 #define	SYS_AUE_orecvfrom	AUE_RECVFROM
 #define	SYS_AUE_setreuid	AUE_SETREUID
 #define	SYS_AUE_setregid	AUE_SETREGID
 #define	SYS_AUE_rename	AUE_RENAME
 #define	SYS_AUE_otruncate	AUE_TRUNCATE
 #define	SYS_AUE_oftruncate	AUE_FTRUNCATE
 #define	SYS_AUE_flock	AUE_FLOCK
 #define	SYS_AUE_mkfifo	AUE_MKFIFO
 #define	SYS_AUE_sendto	AUE_SENDTO
 #define	SYS_AUE_shutdown	AUE_SHUTDOWN
 #define	SYS_AUE_socketpair	AUE_SOCKETPAIR
 #define	SYS_AUE_mkdir	AUE_MKDIR
 #define	SYS_AUE_rmdir	AUE_RMDIR
 #define	SYS_AUE_utimes	AUE_UTIMES
 #define	SYS_AUE_adjtime	AUE_ADJTIME
 #define	SYS_AUE_ogetpeername	AUE_GETPEERNAME
 #define	SYS_AUE_ogethostid	AUE_SYSCTL
 #define	SYS_AUE_osethostid	AUE_SYSCTL
 #define	SYS_AUE_ogetrlimit	AUE_GETRLIMIT
 #define	SYS_AUE_osetrlimit	AUE_SETRLIMIT
 #define	SYS_AUE_okillpg	AUE_KILLPG
 #define	SYS_AUE_setsid	AUE_SETSID
 #define	SYS_AUE_quotactl	AUE_QUOTACTL
 #define	SYS_AUE_oquota	AUE_O_QUOTA
 #define	SYS_AUE_ogetsockname	AUE_GETSOCKNAME
 #define	SYS_AUE_nlm_syscall	AUE_NULL
 #define	SYS_AUE_nfssvc	AUE_NFS_SVC
 #define	SYS_AUE_ogetdirentries	AUE_GETDIRENTRIES
 #define	SYS_AUE_freebsd4_statfs	AUE_STATFS
 #define	SYS_AUE_freebsd4_fstatfs	AUE_FSTATFS
 #define	SYS_AUE_lgetfh	AUE_LGETFH
 #define	SYS_AUE_getfh	AUE_NFS_GETFH
 #define	SYS_AUE_freebsd4_getdomainname	AUE_SYSCTL
 #define	SYS_AUE_freebsd4_setdomainname	AUE_SYSCTL
 #define	SYS_AUE_freebsd4_uname	AUE_NULL
 #define	SYS_AUE_sysarch	AUE_SYSARCH
 #define	SYS_AUE_rtprio	AUE_RTPRIO
 #define	SYS_AUE_semsys	AUE_SEMSYS
 #define	SYS_AUE_msgsys	AUE_MSGSYS
 #define	SYS_AUE_shmsys	AUE_SHMSYS
 #define	SYS_AUE_freebsd6_pread	AUE_PREAD
 #define	SYS_AUE_freebsd6_pwrite	AUE_PWRITE
 #define	SYS_AUE_setfib	AUE_NULL
 #define	SYS_AUE_ntp_adjtime	AUE_NTP_ADJTIME
 #define	SYS_AUE_setgid	AUE_SETGID
 #define	SYS_AUE_setegid	AUE_SETEGID
 #define	SYS_AUE_seteuid	AUE_SETEUID
 #define	SYS_AUE_stat	AUE_STAT
 #define	SYS_AUE_fstat	AUE_FSTAT
 #define	SYS_AUE_lstat	AUE_LSTAT
 #define	SYS_AUE_pathconf	AUE_PATHCONF
 #define	SYS_AUE_fpathconf	AUE_FPATHCONF
 #define	SYS_AUE_getrlimit	AUE_GETRLIMIT
 #define	SYS_AUE_setrlimit	AUE_SETRLIMIT
 #define	SYS_AUE_getdirentries	AUE_GETDIRENTRIES
 #define	SYS_AUE_freebsd6_mmap	AUE_MMAP
 #define	SYS_AUE_freebsd6_lseek	AUE_LSEEK
 #define	SYS_AUE_freebsd6_truncate	AUE_TRUNCATE
 #define	SYS_AUE_freebsd6_ftruncate	AUE_FTRUNCATE
 #define	SYS_AUE___sysctl	AUE_SYSCTL
 #define	SYS_AUE_mlock	AUE_MLOCK
 #define	SYS_AUE_munlock	AUE_MUNLOCK
 #define	SYS_AUE_undelete	AUE_UNDELETE
 #define	SYS_AUE_futimes	AUE_FUTIMES
 #define	SYS_AUE_getpgid	AUE_GETPGID
 #define	SYS_AUE_poll	AUE_POLL
 #define	SYS_AUE_freebsd7___semctl	AUE_SEMCTL
 #define	SYS_AUE_semget	AUE_SEMGET
 #define	SYS_AUE_semop	AUE_SEMOP
 #define	SYS_AUE_freebsd7_msgctl	AUE_MSGCTL
 #define	SYS_AUE_msgget	AUE_MSGGET
 #define	SYS_AUE_msgsnd	AUE_MSGSND
 #define	SYS_AUE_msgrcv	AUE_MSGRCV
 #define	SYS_AUE_shmat	AUE_SHMAT
 #define	SYS_AUE_freebsd7_shmctl	AUE_SHMCTL
 #define	SYS_AUE_shmdt	AUE_SHMDT
 #define	SYS_AUE_shmget	AUE_SHMGET
 #define	SYS_AUE_clock_gettime	AUE_NULL
 #define	SYS_AUE_clock_settime	AUE_CLOCK_SETTIME
 #define	SYS_AUE_clock_getres	AUE_NULL
 #define	SYS_AUE_ktimer_create	AUE_NULL
 #define	SYS_AUE_ktimer_delete	AUE_NULL
 #define	SYS_AUE_ktimer_settime	AUE_NULL
 #define	SYS_AUE_ktimer_gettime	AUE_NULL
 #define	SYS_AUE_ktimer_getoverrun	AUE_NULL
 #define	SYS_AUE_nanosleep	AUE_NULL
 #define	SYS_AUE_ffclock_getcounter	AUE_NULL
 #define	SYS_AUE_ffclock_setestimate	AUE_NULL
 #define	SYS_AUE_ffclock_getestimate	AUE_NULL
 #define	SYS_AUE_clock_getcpuclockid2	AUE_NULL
 #define	SYS_AUE_ntp_gettime	AUE_NULL
 #define	SYS_AUE_minherit	AUE_MINHERIT
 #define	SYS_AUE_rfork	AUE_RFORK
 #define	SYS_AUE_openbsd_poll	AUE_POLL
 #define	SYS_AUE_issetugid	AUE_ISSETUGID
 #define	SYS_AUE_lchown	AUE_LCHOWN
 #define	SYS_AUE_aio_read	AUE_NULL
 #define	SYS_AUE_aio_write	AUE_NULL
 #define	SYS_AUE_lio_listio	AUE_NULL
 #define	SYS_AUE_getdents	AUE_O_GETDENTS
 #define	SYS_AUE_lchmod	AUE_LCHMOD
 #define	SYS_AUE_lutimes	AUE_LUTIMES
 #define	SYS_AUE_nstat	AUE_STAT
 #define	SYS_AUE_nfstat	AUE_FSTAT
 #define	SYS_AUE_nlstat	AUE_LSTAT
 #define	SYS_AUE_preadv	AUE_PREADV
 #define	SYS_AUE_pwritev	AUE_PWRITEV
 #define	SYS_AUE_freebsd4_fhstatfs	AUE_FHSTATFS
 #define	SYS_AUE_fhopen	AUE_FHOPEN
 #define	SYS_AUE_fhstat	AUE_FHSTAT
 #define	SYS_AUE_modnext	AUE_NULL
 #define	SYS_AUE_modstat	AUE_NULL
 #define	SYS_AUE_modfnext	AUE_NULL
 #define	SYS_AUE_modfind	AUE_NULL
 #define	SYS_AUE_kldload	AUE_MODLOAD
 #define	SYS_AUE_kldunload	AUE_MODUNLOAD
 #define	SYS_AUE_kldfind	AUE_NULL
 #define	SYS_AUE_kldnext	AUE_NULL
 #define	SYS_AUE_kldstat	AUE_NULL
 #define	SYS_AUE_kldfirstmod	AUE_NULL
 #define	SYS_AUE_getsid	AUE_GETSID
 #define	SYS_AUE_setresuid	AUE_SETRESUID
 #define	SYS_AUE_setresgid	AUE_SETRESGID
 #define	SYS_AUE_aio_return	AUE_NULL
 #define	SYS_AUE_aio_suspend	AUE_NULL
 #define	SYS_AUE_aio_cancel	AUE_NULL
 #define	SYS_AUE_aio_error	AUE_NULL
 #define	SYS_AUE_freebsd6_aio_read	AUE_NULL
 #define	SYS_AUE_freebsd6_aio_write	AUE_NULL
 #define	SYS_AUE_freebsd6_lio_listio	AUE_NULL
 #define	SYS_AUE_yield	AUE_NULL
 #define	SYS_AUE_mlockall	AUE_MLOCKALL
 #define	SYS_AUE_munlockall	AUE_MUNLOCKALL
 #define	SYS_AUE___getcwd	AUE_GETCWD
 #define	SYS_AUE_sched_setparam	AUE_NULL
 #define	SYS_AUE_sched_getparam	AUE_NULL
 #define	SYS_AUE_sched_setscheduler	AUE_NULL
 #define	SYS_AUE_sched_getscheduler	AUE_NULL
 #define	SYS_AUE_sched_yield	AUE_NULL
 #define	SYS_AUE_sched_get_priority_max	AUE_NULL
 #define	SYS_AUE_sched_get_priority_min	AUE_NULL
 #define	SYS_AUE_sched_rr_get_interval	AUE_NULL
 #define	SYS_AUE_utrace	AUE_NULL
 #define	SYS_AUE_freebsd4_sendfile	AUE_SENDFILE
 #define	SYS_AUE_kldsym	AUE_NULL
 #define	SYS_AUE_jail	AUE_JAIL
 #define	SYS_AUE_nnpfs_syscall	AUE_NULL
 #define	SYS_AUE_sigprocmask	AUE_SIGPROCMASK
 #define	SYS_AUE_sigsuspend	AUE_SIGSUSPEND
 #define	SYS_AUE_freebsd4_sigaction	AUE_SIGACTION
 #define	SYS_AUE_sigpending	AUE_SIGPENDING
 #define	SYS_AUE_freebsd4_sigreturn	AUE_SIGRETURN
 #define	SYS_AUE_sigtimedwait	AUE_SIGWAIT
 #define	SYS_AUE_sigwaitinfo	AUE_NULL
 #define	SYS_AUE___acl_get_file	AUE_NULL
 #define	SYS_AUE___acl_set_file	AUE_NULL
 #define	SYS_AUE___acl_get_fd	AUE_NULL
 #define	SYS_AUE___acl_set_fd	AUE_NULL
 #define	SYS_AUE___acl_delete_file	AUE_NULL
 #define	SYS_AUE___acl_delete_fd	AUE_NULL
 #define	SYS_AUE___acl_aclcheck_file	AUE_NULL
 #define	SYS_AUE___acl_aclcheck_fd	AUE_NULL
 #define	SYS_AUE_extattrctl	AUE_EXTATTRCTL
 #define	SYS_AUE_extattr_set_file	AUE_EXTATTR_SET_FILE
 #define	SYS_AUE_extattr_get_file	AUE_EXTATTR_GET_FILE
 #define	SYS_AUE_extattr_delete_file	AUE_EXTATTR_DELETE_FILE
 #define	SYS_AUE_aio_waitcomplete	AUE_NULL
 #define	SYS_AUE_getresuid	AUE_GETRESUID
 #define	SYS_AUE_getresgid	AUE_GETRESGID
 #define	SYS_AUE_kqueue	AUE_KQUEUE
 #define	SYS_AUE_kevent	AUE_NULL
 #define	SYS_AUE_extattr_set_fd	AUE_EXTATTR_SET_FD
 #define	SYS_AUE_extattr_get_fd	AUE_EXTATTR_GET_FD
 #define	SYS_AUE_extattr_delete_fd	AUE_EXTATTR_DELETE_FD
 #define	SYS_AUE___setugid	AUE_NULL
 #define	SYS_AUE_eaccess	AUE_EACCESS
 #define	SYS_AUE_afs3_syscall	AUE_NULL
 #define	SYS_AUE_nmount	AUE_NMOUNT
 #define	SYS_AUE___mac_get_proc	AUE_NULL
 #define	SYS_AUE___mac_set_proc	AUE_NULL
 #define	SYS_AUE___mac_get_fd	AUE_NULL
 #define	SYS_AUE___mac_get_file	AUE_NULL
 #define	SYS_AUE___mac_set_fd	AUE_NULL
 #define	SYS_AUE___mac_set_file	AUE_NULL
 #define	SYS_AUE_kenv	AUE_NULL
 #define	SYS_AUE_lchflags	AUE_LCHFLAGS
 #define	SYS_AUE_uuidgen	AUE_NULL
 #define	SYS_AUE_sendfile	AUE_SENDFILE
 #define	SYS_AUE_mac_syscall	AUE_NULL
 #define	SYS_AUE_getfsstat	AUE_GETFSSTAT
 #define	SYS_AUE_statfs	AUE_STATFS
 #define	SYS_AUE_fstatfs	AUE_FSTATFS
 #define	SYS_AUE_fhstatfs	AUE_FHSTATFS
 #define	SYS_AUE_ksem_close	AUE_NULL
 #define	SYS_AUE_ksem_post	AUE_NULL
 #define	SYS_AUE_ksem_wait	AUE_NULL
 #define	SYS_AUE_ksem_trywait	AUE_NULL
 #define	SYS_AUE_ksem_init	AUE_NULL
 #define	SYS_AUE_ksem_open	AUE_NULL
 #define	SYS_AUE_ksem_unlink	AUE_NULL
 #define	SYS_AUE_ksem_getvalue	AUE_NULL
 #define	SYS_AUE_ksem_destroy	AUE_NULL
 #define	SYS_AUE___mac_get_pid	AUE_NULL
 #define	SYS_AUE___mac_get_link	AUE_NULL
 #define	SYS_AUE___mac_set_link	AUE_NULL
 #define	SYS_AUE_extattr_set_link	AUE_EXTATTR_SET_LINK
 #define	SYS_AUE_extattr_get_link	AUE_EXTATTR_GET_LINK
 #define	SYS_AUE_extattr_delete_link	AUE_EXTATTR_DELETE_LINK
 #define	SYS_AUE___mac_execve	AUE_NULL
 #define	SYS_AUE_sigaction	AUE_SIGACTION
 #define	SYS_AUE_sigreturn	AUE_SIGRETURN
 #define	SYS_AUE_getcontext	AUE_NULL
 #define	SYS_AUE_setcontext	AUE_NULL
 #define	SYS_AUE_swapcontext	AUE_NULL
 #define	SYS_AUE_swapoff	AUE_SWAPOFF
 #define	SYS_AUE___acl_get_link	AUE_NULL
 #define	SYS_AUE___acl_set_link	AUE_NULL
 #define	SYS_AUE___acl_delete_link	AUE_NULL
 #define	SYS_AUE___acl_aclcheck_link	AUE_NULL
 #define	SYS_AUE_sigwait	AUE_SIGWAIT
 #define	SYS_AUE_thr_create	AUE_NULL
 #define	SYS_AUE_thr_exit	AUE_NULL
 #define	SYS_AUE_thr_self	AUE_NULL
 #define	SYS_AUE_thr_kill	AUE_NULL
 #define	SYS_AUE_jail_attach	AUE_NULL
 #define	SYS_AUE_extattr_list_fd	AUE_EXTATTR_LIST_FD
 #define	SYS_AUE_extattr_list_file	AUE_EXTATTR_LIST_FILE
 #define	SYS_AUE_extattr_list_link	AUE_EXTATTR_LIST_LINK
 #define	SYS_AUE_ksem_timedwait	AUE_NULL
 #define	SYS_AUE_thr_suspend	AUE_NULL
 #define	SYS_AUE_thr_wake	AUE_NULL
 #define	SYS_AUE_kldunloadf	AUE_MODUNLOAD
 #define	SYS_AUE_audit	AUE_AUDIT
 #define	SYS_AUE_auditon	AUE_AUDITON
 #define	SYS_AUE_getauid	AUE_GETAUID
 #define	SYS_AUE_setauid	AUE_SETAUID
 #define	SYS_AUE_getaudit	AUE_GETAUDIT
 #define	SYS_AUE_setaudit	AUE_SETAUDIT
 #define	SYS_AUE_getaudit_addr	AUE_GETAUDIT_ADDR
 #define	SYS_AUE_setaudit_addr	AUE_SETAUDIT_ADDR
 #define	SYS_AUE_auditctl	AUE_AUDITCTL
 #define	SYS_AUE__umtx_op	AUE_NULL
 #define	SYS_AUE_thr_new	AUE_NULL
 #define	SYS_AUE_sigqueue	AUE_NULL
 #define	SYS_AUE_kmq_open	AUE_NULL
 #define	SYS_AUE_kmq_setattr	AUE_NULL
 #define	SYS_AUE_kmq_timedreceive	AUE_NULL
 #define	SYS_AUE_kmq_timedsend	AUE_NULL
 #define	SYS_AUE_kmq_notify	AUE_NULL
 #define	SYS_AUE_kmq_unlink	AUE_NULL
 #define	SYS_AUE_abort2	AUE_NULL
 #define	SYS_AUE_thr_set_name	AUE_NULL
 #define	SYS_AUE_aio_fsync	AUE_NULL
 #define	SYS_AUE_rtprio_thread	AUE_RTPRIO
 #define	SYS_AUE_sctp_peeloff	AUE_NULL
 #define	SYS_AUE_sctp_generic_sendmsg	AUE_NULL
 #define	SYS_AUE_sctp_generic_sendmsg_iov	AUE_NULL
 #define	SYS_AUE_sctp_generic_recvmsg	AUE_NULL
 #define	SYS_AUE_pread	AUE_PREAD
 #define	SYS_AUE_pwrite	AUE_PWRITE
 #define	SYS_AUE_mmap	AUE_MMAP
 #define	SYS_AUE_lseek	AUE_LSEEK
 #define	SYS_AUE_truncate	AUE_TRUNCATE
 #define	SYS_AUE_ftruncate	AUE_FTRUNCATE
 #define	SYS_AUE_thr_kill2	AUE_KILL
 #define	SYS_AUE_shm_open	AUE_SHMOPEN
 #define	SYS_AUE_shm_unlink	AUE_SHMUNLINK
 #define	SYS_AUE_cpuset	AUE_NULL
 #define	SYS_AUE_cpuset_setid	AUE_NULL
 #define	SYS_AUE_cpuset_getid	AUE_NULL
 #define	SYS_AUE_cpuset_getaffinity	AUE_NULL
 #define	SYS_AUE_cpuset_setaffinity	AUE_NULL
 #define	SYS_AUE_faccessat	AUE_FACCESSAT
 #define	SYS_AUE_fchmodat	AUE_FCHMODAT
 #define	SYS_AUE_fchownat	AUE_FCHOWNAT
 #define	SYS_AUE_fexecve	AUE_FEXECVE
 #define	SYS_AUE_fstatat	AUE_FSTATAT
 #define	SYS_AUE_futimesat	AUE_FUTIMESAT
 #define	SYS_AUE_linkat	AUE_LINKAT
 #define	SYS_AUE_mkdirat	AUE_MKDIRAT
 #define	SYS_AUE_mkfifoat	AUE_MKFIFOAT
 #define	SYS_AUE_mknodat	AUE_MKNODAT
 #define	SYS_AUE_openat	AUE_OPENAT_RWTC
 #define	SYS_AUE_readlinkat	AUE_READLINKAT
 #define	SYS_AUE_renameat	AUE_RENAMEAT
 #define	SYS_AUE_symlinkat	AUE_SYMLINKAT
 #define	SYS_AUE_unlinkat	AUE_UNLINKAT
 #define	SYS_AUE_posix_openpt	AUE_POSIX_OPENPT
 #define	SYS_AUE_gssd_syscall	AUE_NULL
 #define	SYS_AUE_jail_get	AUE_NULL
 #define	SYS_AUE_jail_set	AUE_NULL
 #define	SYS_AUE_jail_remove	AUE_NULL
 #define	SYS_AUE_closefrom	AUE_CLOSEFROM
 #define	SYS_AUE___semctl	AUE_SEMCTL
 #define	SYS_AUE_msgctl	AUE_MSGCTL
 #define	SYS_AUE_shmctl	AUE_SHMCTL
 #define	SYS_AUE_lpathconf	AUE_LPATHCONF
 #define	SYS_AUE___cap_rights_get	AUE_CAP_RIGHTS_GET
 #define	SYS_AUE_cap_enter	AUE_CAP_ENTER
 #define	SYS_AUE_cap_getmode	AUE_CAP_GETMODE
 #define	SYS_AUE_pdfork	AUE_PDFORK
 #define	SYS_AUE_pdkill	AUE_PDKILL
 #define	SYS_AUE_pdgetpid	AUE_PDGETPID
 #define	SYS_AUE_pselect	AUE_SELECT
 #define	SYS_AUE_getloginclass	AUE_NULL
 #define	SYS_AUE_setloginclass	AUE_NULL
 #define	SYS_AUE_rctl_get_racct	AUE_NULL
 #define	SYS_AUE_rctl_get_rules	AUE_NULL
 #define	SYS_AUE_rctl_get_limits	AUE_NULL
 #define	SYS_AUE_rctl_add_rule	AUE_NULL
 #define	SYS_AUE_rctl_remove_rule	AUE_NULL
 #define	SYS_AUE_posix_fallocate	AUE_NULL
 #define	SYS_AUE_posix_fadvise	AUE_NULL
 #define	SYS_AUE_wait6	AUE_WAIT6
 #define	SYS_AUE_cap_rights_limit	AUE_CAP_RIGHTS_LIMIT
 #define	SYS_AUE_cap_ioctls_limit	AUE_CAP_IOCTLS_LIMIT
 #define	SYS_AUE_cap_ioctls_get	AUE_CAP_IOCTLS_GET
 #define	SYS_AUE_cap_fcntls_limit	AUE_CAP_FCNTLS_LIMIT
 #define	SYS_AUE_cap_fcntls_get	AUE_CAP_FCNTLS_GET
 #define	SYS_AUE_bindat	AUE_BINDAT
 #define	SYS_AUE_connectat	AUE_CONNECTAT
 #define	SYS_AUE_chflagsat	AUE_CHFLAGSAT
 #define	SYS_AUE_accept4	AUE_ACCEPT
 #define	SYS_AUE_pipe2	AUE_PIPE
 #define	SYS_AUE_aio_mlock	AUE_NULL
 #define	SYS_AUE_procctl	AUE_NULL
 #define	SYS_AUE_ppoll	AUE_POLL
 #define	SYS_AUE_futimens	AUE_FUTIMES
 #define	SYS_AUE_utimensat	AUE_FUTIMESAT
 #define	SYS_AUE_numa_getaffinity	AUE_NULL
 #define	SYS_AUE_numa_setaffinity	AUE_NULL
 
 #undef PAD_
 #undef PADL_
 #undef PADR_
 
 #endif /* !_SYS_SYSPROTO_H_ */
Index: user/alc/PQ_LAUNDRY/sys/vm/vm_pageout.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/vm/vm_pageout.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/vm/vm_pageout.c	(revision 303775)
@@ -1,2154 +1,2154 @@
 /*-
  * Copyright (c) 1991 Regents of the University of California.
  * All rights reserved.
  * Copyright (c) 1994 John S. Dyson
  * All rights reserved.
  * Copyright (c) 1994 David Greenman
  * All rights reserved.
  * Copyright (c) 2005 Yahoo! Technologies Norway AS
  * All rights reserved.
  *
  * This code is derived from software contributed to Berkeley by
  * The Mach Operating System project at Carnegie-Mellon University.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  * 3. All advertising materials mentioning features or use of this software
  *    must display the following acknowledgement:
  *	This product includes software developed by the University of
  *	California, Berkeley and its contributors.
  * 4. Neither the name of the University nor the names of its contributors
  *    may be used to endorse or promote products derived from this software
  *    without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  *
  *	from: @(#)vm_pageout.c	7.4 (Berkeley) 5/7/91
  *
  *
  * Copyright (c) 1987, 1990 Carnegie-Mellon University.
  * All rights reserved.
  *
  * Authors: Avadis Tevanian, Jr., Michael Wayne Young
  *
  * Permission to use, copy, modify and distribute this software and
  * its documentation is hereby granted, provided that both the copyright
  * notice and this permission notice appear in all copies of the
  * software, derivative works or modified versions, and any portions
  * thereof, and that both notices appear in supporting documentation.
  *
  * CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
  * CONDITION.  CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
  * FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
  *
  * Carnegie Mellon requests users of this software to return to
  *
  *  Software Distribution Coordinator  or  Software.Distribution@CS.CMU.EDU
  *  School of Computer Science
  *  Carnegie Mellon University
  *  Pittsburgh PA 15213-3890
  *
  * any improvements or extensions that they make and grant Carnegie the
  * rights to redistribute these changes.
  */
 
 /*
  *	The proverbial page-out daemon.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_vm.h"
 
 #include <sys/param.h>
 #include <sys/systm.h>
 #include <sys/kernel.h>
 #include <sys/eventhandler.h>
 #include <sys/lock.h>
 #include <sys/mutex.h>
 #include <sys/proc.h>
 #include <sys/kthread.h>
 #include <sys/ktr.h>
 #include <sys/mount.h>
 #include <sys/racct.h>
 #include <sys/resourcevar.h>
 #include <sys/sched.h>
 #include <sys/sdt.h>
 #include <sys/signalvar.h>
 #include <sys/smp.h>
 #include <sys/time.h>
 #include <sys/vnode.h>
 #include <sys/vmmeter.h>
 #include <sys/rwlock.h>
 #include <sys/sx.h>
 #include <sys/sysctl.h>
 
 #include <vm/vm.h>
 #include <vm/vm_param.h>
 #include <vm/vm_object.h>
 #include <vm/vm_page.h>
 #include <vm/vm_map.h>
 #include <vm/vm_pageout.h>
 #include <vm/vm_pager.h>
 #include <vm/vm_phys.h>
 #include <vm/swap_pager.h>
 #include <vm/vm_extern.h>
 #include <vm/uma.h>
 
 /*
  * System initialization
  */
 
 /* the kernel process "vm_pageout"*/
 static void vm_pageout(void);
 static void vm_pageout_init(void);
 static int vm_pageout_clean(vm_page_t m, int *numpagedout);
 static int vm_pageout_cluster(vm_page_t m);
 static void vm_pageout_scan(struct vm_domain *vmd, int pass);
 static void vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
     int starting_page_shortage);
 
 SYSINIT(pagedaemon_init, SI_SUB_KTHREAD_PAGE, SI_ORDER_FIRST, vm_pageout_init,
     NULL);
 
 struct proc *pageproc;
 
 static struct kproc_desc page_kp = {
 	"pagedaemon",
 	vm_pageout,
 	&pageproc
 };
 SYSINIT(pagedaemon, SI_SUB_KTHREAD_PAGE, SI_ORDER_SECOND, kproc_start,
     &page_kp);
 
 SDT_PROVIDER_DEFINE(vm);
 SDT_PROBE_DEFINE(vm, , , vm__lowmem_scan);
 
 #if !defined(NO_SWAPPING)
 /* the kernel process "vm_daemon"*/
 static void vm_daemon(void);
 static struct	proc *vmproc;
 
 static struct kproc_desc vm_kp = {
 	"vmdaemon",
 	vm_daemon,
 	&vmproc
 };
 SYSINIT(vmdaemon, SI_SUB_KTHREAD_VM, SI_ORDER_FIRST, kproc_start, &vm_kp);
 #endif
 
 /* Sleep intervals for pagedaemon threads, in subdivisions of one second. */
 #define	VM_LAUNDER_INTERVAL	10
 #define	VM_INACT_SCAN_INTERVAL	2
 
 #define	VM_LAUNDER_RATE		(VM_LAUNDER_INTERVAL / VM_INACT_SCAN_INTERVAL)
 
 int vm_pageout_deficit;		/* Estimated number of pages deficit */
 u_int vm_pageout_wakeup_thresh;
 static int vm_pageout_oom_seq = 12;
 bool vm_pageout_wanted;		/* Event on which pageout daemon sleeps */
 bool vm_pages_needed;		/* Are threads waiting for free pages? */
 
 #if !defined(NO_SWAPPING)
 static int vm_pageout_req_swapout;	/* XXX */
 static int vm_daemon_needed;
 static struct mtx vm_daemon_mtx;
 /* Allow for use by vm_pageout before vm_daemon is initialized. */
 MTX_SYSINIT(vm_daemon, &vm_daemon_mtx, "vm daemon", MTX_DEF);
 #endif
 static int vm_pageout_update_period;
 static int disable_swap_pageouts;
 static int lowmem_period = 10;
 static time_t lowmem_uptime;
 
 #if defined(NO_SWAPPING)
 static int vm_swap_enabled = 0;
 static int vm_swap_idle_enabled = 0;
 #else
 static int vm_swap_enabled = 1;
 static int vm_swap_idle_enabled = 0;
 #endif
 
 static int vm_panic_on_oom = 0;
 
 SYSCTL_INT(_vm, OID_AUTO, panic_on_oom,
 	CTLFLAG_RWTUN, &vm_panic_on_oom, 0,
 	"panic on out of memory instead of killing the largest process");
 
 SYSCTL_INT(_vm, OID_AUTO, pageout_wakeup_thresh,
 	CTLFLAG_RW, &vm_pageout_wakeup_thresh, 0,
 	"free page threshold for waking up the pageout daemon");
 
 SYSCTL_INT(_vm, OID_AUTO, pageout_update_period,
 	CTLFLAG_RW, &vm_pageout_update_period, 0,
 	"Maximum active LRU update period");
   
 SYSCTL_INT(_vm, OID_AUTO, lowmem_period, CTLFLAG_RW, &lowmem_period, 0,
 	"Low memory callback period");
 
 #if defined(NO_SWAPPING)
 SYSCTL_INT(_vm, VM_SWAPPING_ENABLED, swap_enabled,
 	CTLFLAG_RD, &vm_swap_enabled, 0, "Enable entire process swapout");
 SYSCTL_INT(_vm, OID_AUTO, swap_idle_enabled,
 	CTLFLAG_RD, &vm_swap_idle_enabled, 0, "Allow swapout on idle criteria");
 #else
 SYSCTL_INT(_vm, VM_SWAPPING_ENABLED, swap_enabled,
 	CTLFLAG_RW, &vm_swap_enabled, 0, "Enable entire process swapout");
 SYSCTL_INT(_vm, OID_AUTO, swap_idle_enabled,
 	CTLFLAG_RW, &vm_swap_idle_enabled, 0, "Allow swapout on idle criteria");
 #endif
 
 SYSCTL_INT(_vm, OID_AUTO, disable_swapspace_pageouts,
 	CTLFLAG_RW, &disable_swap_pageouts, 0, "Disallow swapout of dirty pages");
 
 static int pageout_lock_miss;
 SYSCTL_INT(_vm, OID_AUTO, pageout_lock_miss,
 	CTLFLAG_RD, &pageout_lock_miss, 0, "vget() lock misses during pageout");
 
 SYSCTL_INT(_vm, OID_AUTO, pageout_oom_seq,
 	CTLFLAG_RW, &vm_pageout_oom_seq, 0,
 	"back-to-back calls to oom detector to start OOM");
 
 static int act_scan_laundry_weight = 3;
 SYSCTL_INT(_vm, OID_AUTO, act_scan_laundry_weight,
 	CTLFLAG_RW, &act_scan_laundry_weight, 0,
 	"weight given to clean vs. dirty pages in active queue scans");
 
 static u_int bkgrd_launder_ratio = 50;
 SYSCTL_UINT(_vm, OID_AUTO, bkgrd_launder_ratio,
 	CTLFLAG_RW, &bkgrd_launder_ratio, 0,
 	"ratio of clean to dirty inactive pages needed to trigger laundering");
 
 static u_int bkgrd_launder_max = 2048;
 SYSCTL_UINT(_vm, OID_AUTO, bkgrd_launder_max,
 	CTLFLAG_RW, &bkgrd_launder_max, 0,
 	"maximum background laundering rate, in pages per second");
 
 #define VM_PAGEOUT_PAGE_COUNT 16
 int vm_pageout_page_count = VM_PAGEOUT_PAGE_COUNT;
 
 int vm_page_max_wired;		/* XXX max # of wired pages system-wide */
 SYSCTL_INT(_vm, OID_AUTO, max_wired,
 	CTLFLAG_RW, &vm_page_max_wired, 0, "System-wide limit to wired page count");
 
 static boolean_t vm_pageout_fallback_object_lock(vm_page_t, vm_page_t *);
 static int vm_pageout_launder(struct vm_domain *vmd, int launder,
     bool shortfall);
 static void vm_pageout_laundry_worker(void *arg);
 #if !defined(NO_SWAPPING)
 static void vm_pageout_map_deactivate_pages(vm_map_t, long);
 static void vm_pageout_object_deactivate_pages(pmap_t, vm_object_t, long);
 static void vm_req_vmdaemon(int req);
 #endif
 static boolean_t vm_pageout_page_lock(vm_page_t, vm_page_t *);
 
 /*
  * Initialize a dummy page for marking the caller's place in the specified
  * paging queue.  In principle, this function only needs to set the flag
- * PG_MARKER.  Nonetheless, it wirte busies and initializes the hold count
+ * PG_MARKER.  Nonetheless, it write busies and initializes the hold count
  * to one as safety precautions.
  */ 
 static void
 vm_pageout_init_marker(vm_page_t marker, u_short queue)
 {
 
 	bzero(marker, sizeof(*marker));
 	marker->flags = PG_MARKER;
 	marker->busy_lock = VPB_SINGLE_EXCLUSIVER;
 	marker->queue = queue;
 	marker->hold_count = 1;
 }
 
 /*
  * vm_pageout_fallback_object_lock:
  * 
  * Lock vm object currently associated with `m'. VM_OBJECT_TRYWLOCK is
  * known to have failed and page queue must be either PQ_ACTIVE or
  * PQ_INACTIVE.  To avoid lock order violation, unlock the page queue
  * while locking the vm object.  Use marker page to detect page queue
  * changes and maintain notion of next page on page queue.  Return
  * TRUE if no changes were detected, FALSE otherwise.  vm object is
  * locked on return.
  * 
  * This function depends on both the lock portion of struct vm_object
  * and normal struct vm_page being type stable.
  */
 static boolean_t
 vm_pageout_fallback_object_lock(vm_page_t m, vm_page_t *next)
 {
 	struct vm_page marker;
 	struct vm_pagequeue *pq;
 	boolean_t unchanged;
 	u_short queue;
 	vm_object_t object;
 
 	queue = m->queue;
 	vm_pageout_init_marker(&marker, queue);
 	pq = vm_page_pagequeue(m);
 	object = m->object;
 	
 	TAILQ_INSERT_AFTER(&pq->pq_pl, m, &marker, plinks.q);
 	vm_pagequeue_unlock(pq);
 	vm_page_unlock(m);
 	VM_OBJECT_WLOCK(object);
 	vm_page_lock(m);
 	vm_pagequeue_lock(pq);
 
 	/*
 	 * The page's object might have changed, and/or the page might
 	 * have moved from its original position in the queue.  If the
 	 * page's object has changed, then the caller should abandon
 	 * processing the page because the wrong object lock was
 	 * acquired.  Use the marker's plinks.q, not the page's, to
 	 * determine if the page has been moved.  The state of the
 	 * page's plinks.q can be indeterminate; whereas, the marker's
 	 * plinks.q must be valid.
 	 */
 	*next = TAILQ_NEXT(&marker, plinks.q);
 	unchanged = m->object == object &&
 	    m == TAILQ_PREV(&marker, pglist, plinks.q);
 	KASSERT(!unchanged || m->queue == queue,
 	    ("page %p queue %d %d", m, queue, m->queue));
 	TAILQ_REMOVE(&pq->pq_pl, &marker, plinks.q);
 	return (unchanged);
 }
 
 /*
  * Lock the page while holding the page queue lock.  Use marker page
  * to detect page queue changes and maintain notion of next page on
  * page queue.  Return TRUE if no changes were detected, FALSE
  * otherwise.  The page is locked on return. The page queue lock might
  * be dropped and reacquired.
  *
  * This function depends on normal struct vm_page being type stable.
  */
 static boolean_t
 vm_pageout_page_lock(vm_page_t m, vm_page_t *next)
 {
 	struct vm_page marker;
 	struct vm_pagequeue *pq;
 	boolean_t unchanged;
 	u_short queue;
 
 	vm_page_lock_assert(m, MA_NOTOWNED);
 	if (vm_page_trylock(m))
 		return (TRUE);
 
 	queue = m->queue;
 	vm_pageout_init_marker(&marker, queue);
 	pq = vm_page_pagequeue(m);
 
 	TAILQ_INSERT_AFTER(&pq->pq_pl, m, &marker, plinks.q);
 	vm_pagequeue_unlock(pq);
 	vm_page_lock(m);
 	vm_pagequeue_lock(pq);
 
 	/* Page queue might have changed. */
 	*next = TAILQ_NEXT(&marker, plinks.q);
 	unchanged = m == TAILQ_PREV(&marker, pglist, plinks.q);
 	KASSERT(!unchanged || m->queue == queue,
 	    ("page %p queue %d %d", m, queue, m->queue));
 	TAILQ_REMOVE(&pq->pq_pl, &marker, plinks.q);
 	return (unchanged);
 }
 
 /*
  * Scan for pages at adjacent offsets within the given page's object that are
  * eligible for laundering, form a cluster of these pages and the given page,
  * and launder that cluster.
  */
 static int
 vm_pageout_cluster(vm_page_t m)
 {
 	vm_object_t object;
 	vm_page_t mc[2 * vm_pageout_page_count], p, pb, ps;
 	vm_pindex_t pindex;
 	int ib, is, page_base, pageout_count;
 
 	vm_page_assert_locked(m);
 	object = m->object;
 	VM_OBJECT_ASSERT_WLOCKED(object);
 	pindex = m->pindex;
 
 	/*
 	 * We can't clean the page if it is busy or held.
 	 */
 	vm_page_assert_unbusied(m);
 	KASSERT(m->hold_count == 0, ("page %p is held", m));
 	vm_page_dequeue(m);
 	vm_page_unlock(m);
 
 	mc[vm_pageout_page_count] = pb = ps = m;
 	pageout_count = 1;
 	page_base = vm_pageout_page_count;
 	ib = 1;
 	is = 1;
 
 	/*
 	 * We can cluster only if the page is not clean, busy, or held, and
 	 * the page is in the laundry queue.
 	 *
 	 * During heavy mmap/modification loads the pageout
 	 * daemon can really fragment the underlying file
 	 * due to flushing pages out of order and not trying to
 	 * align the clusters (which leaves sporadic out-of-order
 	 * holes).  To solve this problem we do the reverse scan
 	 * first and attempt to align our cluster, then do a 
 	 * forward scan if room remains.
 	 */
 more:
 	while (ib != 0 && pageout_count < vm_pageout_page_count) {
 		if (ib > pindex) {
 			ib = 0;
 			break;
 		}
 		if ((p = vm_page_prev(pb)) == NULL || vm_page_busied(p)) {
 			ib = 0;
 			break;
 		}
 		vm_page_test_dirty(p);
 		if (p->dirty == 0) {
 			ib = 0;
 			break;
 		}
 		vm_page_lock(p);
 		if (!vm_page_in_laundry(p) ||
 		    p->hold_count != 0) {	/* may be undergoing I/O */
 			vm_page_unlock(p);
 			ib = 0;
 			break;
 		}
 		vm_page_dequeue(p);
 		vm_page_unlock(p);
 		mc[--page_base] = pb = p;
 		++pageout_count;
 		++ib;
 
 		/*
 		 * We are at an alignment boundary.  Stop here, and switch
 		 * directions.  Do not clear ib.
 		 */
 		if ((pindex - (ib - 1)) % vm_pageout_page_count == 0)
 			break;
 	}
 	while (pageout_count < vm_pageout_page_count && 
 	    pindex + is < object->size) {
 		if ((p = vm_page_next(ps)) == NULL || vm_page_busied(p))
 			break;
 		vm_page_test_dirty(p);
 		if (p->dirty == 0)
 			break;
 		vm_page_lock(p);
 		if (!vm_page_in_laundry(p) ||
 		    p->hold_count != 0) {	/* may be undergoing I/O */
 			vm_page_unlock(p);
 			break;
 		}
 		vm_page_dequeue(p);
 		vm_page_unlock(p);
 		mc[page_base + pageout_count] = ps = p;
 		++pageout_count;
 		++is;
 	}
 
 	/*
 	 * If we exhausted our forward scan, continue with the reverse scan
 	 * when possible, even past an alignment boundary.  This catches
 	 * boundary conditions.
 	 */
 	if (ib != 0 && pageout_count < vm_pageout_page_count)
 		goto more;
 
 	return (vm_pageout_flush(&mc[page_base], pageout_count, 0, 0, NULL,
 	    NULL));
 }
 
 /*
  * vm_pageout_flush() - launder the given pages
  *
  *	The given pages are laundered.  Note that we setup for the start of
  *	I/O ( i.e. busy the page ), mark it read-only, and bump the object
  *	reference count all in here rather then in the parent.  If we want
  *	the parent to do more sophisticated things we may have to change
  *	the ordering.
  *
  *	Returned runlen is the count of pages between mreq and first
  *	page after mreq with status VM_PAGER_AGAIN.
  *	*eio is set to TRUE if pager returned VM_PAGER_ERROR or VM_PAGER_FAIL
  *	for any page in runlen set.
  */
 int
 vm_pageout_flush(vm_page_t *mc, int count, int flags, int mreq, int *prunlen,
     boolean_t *eio)
 {
 	vm_object_t object = mc[0]->object;
 	int pageout_status[count];
 	int numpagedout = 0;
 	int i, runlen;
 
 	VM_OBJECT_ASSERT_WLOCKED(object);
 
 	/*
 	 * Initiate I/O.  Bump the vm_page_t->busy counter and
 	 * mark the pages read-only.
 	 *
 	 * We do not have to fixup the clean/dirty bits here... we can
 	 * allow the pager to do it after the I/O completes.
 	 *
 	 * NOTE! mc[i]->dirty may be partial or fragmented due to an
 	 * edge case with file fragments.
 	 */
 	for (i = 0; i < count; i++) {
 		KASSERT(mc[i]->valid == VM_PAGE_BITS_ALL,
 		    ("vm_pageout_flush: partially invalid page %p index %d/%d",
 			mc[i], i, count));
 		vm_page_sbusy(mc[i]);
 		pmap_remove_write(mc[i]);
 	}
 	vm_object_pip_add(object, count);
 
 	vm_pager_put_pages(object, mc, count, flags, pageout_status);
 
 	runlen = count - mreq;
 	if (eio != NULL)
 		*eio = FALSE;
 	for (i = 0; i < count; i++) {
 		vm_page_t mt = mc[i];
 
 		KASSERT(pageout_status[i] == VM_PAGER_PEND ||
 		    !pmap_page_is_write_mapped(mt),
 		    ("vm_pageout_flush: page %p is not write protected", mt));
 		switch (pageout_status[i]) {
 		case VM_PAGER_OK:
 		case VM_PAGER_PEND:
 			numpagedout++;
 			break;
 		case VM_PAGER_BAD:
 			/*
 			 * Page outside of range of object. Right now we
 			 * essentially lose the changes by pretending it
 			 * worked.
 			 */
 			vm_page_undirty(mt);
 			vm_page_lock(mt);
 			vm_page_deactivate(mt);
 			vm_page_unlock(mt);
 			break;
 		case VM_PAGER_ERROR:
 		case VM_PAGER_FAIL:
 			/*
 			 * If the page couldn't be paged out, then reactivate
 			 * it so that it doesn't clog the laundry and inactive
 			 * queues.  (We will try paging it out again later).
 			 */
 			vm_page_lock(mt);
 			vm_page_activate(mt);
 			vm_page_unlock(mt);
 			if (eio != NULL && i >= mreq && i - mreq < runlen)
 				*eio = TRUE;
 			break;
 		case VM_PAGER_AGAIN:
 			if (i >= mreq && i - mreq < runlen)
 				runlen = i - mreq;
 			break;
 		}
 
 		/*
 		 * If the operation is still going, leave the page busy to
 		 * block all other accesses. Also, leave the paging in
 		 * progress indicator set so that we don't attempt an object
 		 * collapse.
 		 */
 		if (pageout_status[i] != VM_PAGER_PEND) {
 			vm_object_pip_wakeup(object);
 			vm_page_sunbusy(mt);
 		}
 	}
 	if (prunlen != NULL)
 		*prunlen = runlen;
 	return (numpagedout);
 }
 
 #if !defined(NO_SWAPPING)
 /*
  *	vm_pageout_object_deactivate_pages
  *
  *	Deactivate enough pages to satisfy the inactive target
  *	requirements.
  *
  *	The object and map must be locked.
  */
 static void
 vm_pageout_object_deactivate_pages(pmap_t pmap, vm_object_t first_object,
     long desired)
 {
 	vm_object_t backing_object, object;
 	vm_page_t p;
 	int act_delta, remove_mode;
 
 	VM_OBJECT_ASSERT_LOCKED(first_object);
 	if ((first_object->flags & OBJ_FICTITIOUS) != 0)
 		return;
 	for (object = first_object;; object = backing_object) {
 		if (pmap_resident_count(pmap) <= desired)
 			goto unlock_return;
 		VM_OBJECT_ASSERT_LOCKED(object);
 		if ((object->flags & OBJ_UNMANAGED) != 0 ||
 		    object->paging_in_progress != 0)
 			goto unlock_return;
 
 		remove_mode = 0;
 		if (object->shadow_count > 1)
 			remove_mode = 1;
 		/*
 		 * Scan the object's entire memory queue.
 		 */
 		TAILQ_FOREACH(p, &object->memq, listq) {
 			if (pmap_resident_count(pmap) <= desired)
 				goto unlock_return;
 			if (vm_page_busied(p))
 				continue;
 			PCPU_INC(cnt.v_pdpages);
 			vm_page_lock(p);
 			if (p->wire_count != 0 || p->hold_count != 0 ||
 			    !pmap_page_exists_quick(pmap, p)) {
 				vm_page_unlock(p);
 				continue;
 			}
 			act_delta = pmap_ts_referenced(p);
 			if ((p->aflags & PGA_REFERENCED) != 0) {
 				if (act_delta == 0)
 					act_delta = 1;
 				vm_page_aflag_clear(p, PGA_REFERENCED);
 			}
 			if (!vm_page_active(p) && act_delta != 0) {
 				vm_page_activate(p);
 				p->act_count += act_delta;
 			} else if (vm_page_active(p)) {
 				if (act_delta == 0) {
 					p->act_count -= min(p->act_count,
 					    ACT_DECLINE);
 					if (!remove_mode && p->act_count == 0) {
 						pmap_remove_all(p);
 						vm_page_deactivate(p);
 					} else
 						vm_page_requeue(p);
 				} else {
 					vm_page_activate(p);
 					if (p->act_count < ACT_MAX -
 					    ACT_ADVANCE)
 						p->act_count += ACT_ADVANCE;
 					vm_page_requeue(p);
 				}
 			} else if (vm_page_inactive(p))
 				pmap_remove_all(p);
 			vm_page_unlock(p);
 		}
 		if ((backing_object = object->backing_object) == NULL)
 			goto unlock_return;
 		VM_OBJECT_RLOCK(backing_object);
 		if (object != first_object)
 			VM_OBJECT_RUNLOCK(object);
 	}
 unlock_return:
 	if (object != first_object)
 		VM_OBJECT_RUNLOCK(object);
 }
 
 /*
  * deactivate some number of pages in a map, try to do it fairly, but
  * that is really hard to do.
  */
 static void
 vm_pageout_map_deactivate_pages(map, desired)
 	vm_map_t map;
 	long desired;
 {
 	vm_map_entry_t tmpe;
 	vm_object_t obj, bigobj;
 	int nothingwired;
 
 	if (!vm_map_trylock(map))
 		return;
 
 	bigobj = NULL;
 	nothingwired = TRUE;
 
 	/*
 	 * first, search out the biggest object, and try to free pages from
 	 * that.
 	 */
 	tmpe = map->header.next;
 	while (tmpe != &map->header) {
 		if ((tmpe->eflags & MAP_ENTRY_IS_SUB_MAP) == 0) {
 			obj = tmpe->object.vm_object;
 			if (obj != NULL && VM_OBJECT_TRYRLOCK(obj)) {
 				if (obj->shadow_count <= 1 &&
 				    (bigobj == NULL ||
 				     bigobj->resident_page_count < obj->resident_page_count)) {
 					if (bigobj != NULL)
 						VM_OBJECT_RUNLOCK(bigobj);
 					bigobj = obj;
 				} else
 					VM_OBJECT_RUNLOCK(obj);
 			}
 		}
 		if (tmpe->wired_count > 0)
 			nothingwired = FALSE;
 		tmpe = tmpe->next;
 	}
 
 	if (bigobj != NULL) {
 		vm_pageout_object_deactivate_pages(map->pmap, bigobj, desired);
 		VM_OBJECT_RUNLOCK(bigobj);
 	}
 	/*
 	 * Next, hunt around for other pages to deactivate.  We actually
 	 * do this search sort of wrong -- .text first is not the best idea.
 	 */
 	tmpe = map->header.next;
 	while (tmpe != &map->header) {
 		if (pmap_resident_count(vm_map_pmap(map)) <= desired)
 			break;
 		if ((tmpe->eflags & MAP_ENTRY_IS_SUB_MAP) == 0) {
 			obj = tmpe->object.vm_object;
 			if (obj != NULL) {
 				VM_OBJECT_RLOCK(obj);
 				vm_pageout_object_deactivate_pages(map->pmap, obj, desired);
 				VM_OBJECT_RUNLOCK(obj);
 			}
 		}
 		tmpe = tmpe->next;
 	}
 
 	/*
 	 * Remove all mappings if a process is swapped out, this will free page
 	 * table pages.
 	 */
 	if (desired == 0 && nothingwired) {
 		pmap_remove(vm_map_pmap(map), vm_map_min(map),
 		    vm_map_max(map));
 	}
 
 	vm_map_unlock(map);
 }
 #endif		/* !defined(NO_SWAPPING) */
 
 /*
  * Attempt to acquire all of the necessary locks to launder a page and
  * then call through the clustering layer to PUTPAGES.  Wait a short
  * time for a vnode lock.
  *
  * Requires the page and object lock on entry, releases both before return.
  * Returns 0 on success and an errno otherwise.
  */
 static int
 vm_pageout_clean(vm_page_t m, int *numpagedout)
 {
 	struct vnode *vp;
 	struct mount *mp;
 	vm_object_t object;
 	vm_pindex_t pindex;
 	int error, lockmode;
 
 	vm_page_assert_locked(m);
 	object = m->object;
 	VM_OBJECT_ASSERT_WLOCKED(object);
 	error = 0;
 	vp = NULL;
 	mp = NULL;
 
 	/*
 	 * The object is already known NOT to be dead.   It
 	 * is possible for the vget() to block the whole
 	 * pageout daemon, but the new low-memory handling
 	 * code should prevent it.
 	 *
 	 * We can't wait forever for the vnode lock, we might
 	 * deadlock due to a vn_read() getting stuck in
 	 * vm_wait while holding this vnode.  We skip the 
 	 * vnode if we can't get it in a reasonable amount
 	 * of time.
 	 */
 	if (object->type == OBJT_VNODE) {
 		vm_page_unlock(m);
 		vp = object->handle;
 		if (vp->v_type == VREG &&
 		    vn_start_write(vp, &mp, V_NOWAIT) != 0) {
 			mp = NULL;
 			error = EDEADLK;
 			goto unlock_all;
 		}
 		KASSERT(mp != NULL,
 		    ("vp %p with NULL v_mount", vp));
 		vm_object_reference_locked(object);
 		pindex = m->pindex;
 		VM_OBJECT_WUNLOCK(object);
 		lockmode = MNT_SHARED_WRITES(vp->v_mount) ?
 		    LK_SHARED : LK_EXCLUSIVE;
 		if (vget(vp, lockmode | LK_TIMELOCK, curthread)) {
 			vp = NULL;
 			error = EDEADLK;
 			goto unlock_mp;
 		}
 		VM_OBJECT_WLOCK(object);
 		vm_page_lock(m);
 		/*
 		 * While the object and page were unlocked, the page
 		 * may have been:
 		 * (1) moved to a different queue,
 		 * (2) reallocated to a different object,
 		 * (3) reallocated to a different offset, or
 		 * (4) cleaned.
 		 */
 		if (!vm_page_in_laundry(m) || m->object != object ||
 		    m->pindex != pindex || m->dirty == 0) {
 			vm_page_unlock(m);
 			error = ENXIO;
 			goto unlock_all;
 		}
 
 		/*
 		 * The page may have been busied or held while the object
 		 * and page locks were released.
 		 */
 		if (vm_page_busied(m) || m->hold_count != 0) {
 			vm_page_unlock(m);
 			error = EBUSY;
 			goto unlock_all;
 		}
 	}
 
 	/*
 	 * If a page is dirty, then it is either being washed
 	 * (but not yet cleaned) or it is still in the
 	 * laundry.  If it is still in the laundry, then we
 	 * start the cleaning operation. 
 	 */
 	if ((*numpagedout = vm_pageout_cluster(m)) == 0)
 		error = EIO;
 
 unlock_all:
 	VM_OBJECT_WUNLOCK(object);
 
 unlock_mp:
 	vm_page_lock_assert(m, MA_NOTOWNED);
 	if (mp != NULL) {
 		if (vp != NULL)
 			vput(vp);
 		vm_object_deallocate(object);
 		vn_finished_write(mp);
 	}
 
 	return (error);
 }
 
 /*
  * Attempt to launder the specified number of pages.
  *
  * Returns the number of pages successfully laundered.
  */
 static int
 vm_pageout_launder(struct vm_domain *vmd, int launder, bool shortfall)
 {
 	struct vm_pagequeue *pq;
 	vm_object_t object;
 	vm_page_t m, next;
 	int act_delta, error, maxscan, numpagedout, starting_target;
 	int vnodes_skipped;
 	bool pageout_ok, queue_locked;
 
 	starting_target = launder;
 	vnodes_skipped = 0;
 
 	/*
 	 * Scan the laundry queue for pages eligible to be laundered.  We stop
 	 * once the target number of dirty pages have been laundered, or once
 	 * we've reached the end of the queue.  A single iteration of this loop
 	 * may cause more than one page to be laundered because of clustering.
 	 *
 	 * maxscan ensures that we don't re-examine requeued pages.  Any
 	 * additional pages written as part of a cluster are subtracted from
 	 * maxscan since they must be taken from the laundry queue.
 	 */
 	pq = &vmd->vmd_pagequeues[PQ_LAUNDRY];
 	maxscan = pq->pq_cnt;
 
 	vm_pagequeue_lock(pq);
 	queue_locked = true;
 	for (m = TAILQ_FIRST(&pq->pq_pl);
 	    m != NULL && maxscan-- > 0 && launder > 0;
 	    m = next) {
 		vm_pagequeue_assert_locked(pq);
 		KASSERT(queue_locked, ("unlocked laundry queue"));
 		KASSERT(vm_page_in_laundry(m),
 		    ("page %p has an inconsistent queue", m));
 		next = TAILQ_NEXT(m, plinks.q);
 		if ((m->flags & PG_MARKER) != 0)
 			continue;
 		KASSERT((m->flags & PG_FICTITIOUS) == 0,
 		    ("PG_FICTITIOUS page %p cannot be in laundry queue", m));
 		KASSERT((m->oflags & VPO_UNMANAGED) == 0,
 		    ("VPO_UNMANAGED page %p cannot be in laundry queue", m));
 		if (!vm_pageout_page_lock(m, &next) || m->hold_count != 0) {
 			vm_page_unlock(m);
 			continue;
 		}
 		object = m->object;
 		if ((!VM_OBJECT_TRYWLOCK(object) &&
 		    (!vm_pageout_fallback_object_lock(m, &next) ||
 		    m->hold_count != 0)) || vm_page_busied(m)) {
 			VM_OBJECT_WUNLOCK(object);
 			vm_page_unlock(m);
 			continue;
 		}
 
 		/*
 		 * Unlock the laundry queue, invalidating the 'next' pointer.
 		 * Use a marker to remember our place in the laundry queue.
 		 */
 		TAILQ_INSERT_AFTER(&pq->pq_pl, m, &vmd->vmd_laundry_marker,
 		    plinks.q);
 		vm_pagequeue_unlock(pq);
 		queue_locked = false;
 
 		/*
 		 * Invalid pages can be easily freed.  They cannot be
 		 * mapped; vm_page_free() asserts this.
 		 */
 		if (m->valid == 0)
 			goto free_page;
 
 		/*
 		 * If the page has been referenced and the object is not dead,
 		 * reactivate or requeue the page depending on whether the
 		 * object is mapped.
 		 */
 		if ((m->aflags & PGA_REFERENCED) != 0) {
 			vm_page_aflag_clear(m, PGA_REFERENCED);
 			act_delta = 1;
 		} else
 			act_delta = 0;
 		if (object->ref_count != 0)
 			act_delta += pmap_ts_referenced(m);
 		else {
 			KASSERT(!pmap_page_is_mapped(m),
 			    ("page %p is mapped", m));
 		}
 		if (act_delta != 0) {
 			if (object->ref_count != 0) {
 				vm_page_activate(m);
 
 				/*
 				 * Increase the activation count if the page
 				 * was referenced while in the laundry queue.
 				 * This makes it less likely that the page will
 				 * be returned prematurely to the inactive
 				 * queue.
  				 */
 				m->act_count += act_delta + ACT_ADVANCE;
 
 				/*
 				 * If this was a background laundering, count
 				 * activated pages towards our target.  The
 				 * purpose of background laundering is to ensure
 				 * that pages are eventually cycled through the
 				 * laundry queue, and an activation is a valid
 				 * way out.
 				 */
 				if (!shortfall)
 					launder--;
 				goto drop_page;
 			} else if ((object->flags & OBJ_DEAD) == 0)
 				goto requeue_page;
 		}
 
 		/*
 		 * If the page appears to be clean at the machine-independent
 		 * layer, then remove all of its mappings from the pmap in
 		 * anticipation of freeing it.  If, however, any of the page's
 		 * mappings allow write access, then the page may still be
 		 * modified until the last of those mappings are removed.
 		 */
 		if (object->ref_count != 0) {
 			vm_page_test_dirty(m);
 			if (m->dirty == 0)
 				pmap_remove_all(m);
 		}
 
 		/*
 		 * Clean pages are freed, and dirty pages are paged out unless
 		 * they belong to a dead object.  Requeueing dirty pages from
 		 * dead objects is pointless, as they are being paged out and
 		 * freed by the thread that destroyed the object.
 		 */
 		if (m->dirty == 0) {
 free_page:
 			vm_page_free(m);
 			PCPU_INC(cnt.v_dfree);
 		} else if ((object->flags & OBJ_DEAD) == 0) {
 			if (object->type != OBJT_SWAP &&
 			    object->type != OBJT_DEFAULT)
 				pageout_ok = true;
 			else if (disable_swap_pageouts)
 				pageout_ok = false;
 			else
 				pageout_ok = true;
 			if (!pageout_ok) {
 requeue_page:
 				vm_pagequeue_lock(pq);
 				queue_locked = true;
 				vm_page_requeue_locked(m);
 				goto drop_page;
 			}
 			error = vm_pageout_clean(m, &numpagedout);
 			if (error == 0) {
 				launder -= numpagedout;
 				maxscan -= numpagedout - 1;
 			} else if (error == EDEADLK) {
 				pageout_lock_miss++;
 				vnodes_skipped++;
 			}
 			goto relock_queue;
 		}
 drop_page:
 		vm_page_unlock(m);
 		VM_OBJECT_WUNLOCK(object);
 relock_queue:
 		if (!queue_locked) {
 			vm_pagequeue_lock(pq);
 			queue_locked = true;
 		}
 		next = TAILQ_NEXT(&vmd->vmd_laundry_marker, plinks.q);
 		TAILQ_REMOVE(&pq->pq_pl, &vmd->vmd_laundry_marker, plinks.q);
 	}
 	vm_pagequeue_unlock(pq);
 
 	/*
 	 * Wakeup the sync daemon if we skipped a vnode in a writeable object
 	 * and we didn't launder enough pages.
 	 */
 	if (vnodes_skipped > 0 && launder > 0)
 		(void)speedup_syncer();
 
 	return (starting_target - launder);
 }
 
 /*
  * Perform the work of the laundry thread: periodically wake up and determine
  * whether any pages need to be laundered.  If so, determine the number of pages
  * that need to be laundered, and launder them.
  */
 static void
 vm_pageout_laundry_worker(void *arg)
 {
 	struct vm_domain *domain;
 	uint64_t ninact, nlaundry;
 	u_int wakeups, gen;
 	int cycle, domidx, launder, prev_shortfall, shortfall, target;
 
 	domidx = (uintptr_t)arg;
 	domain = &vm_dom[domidx];
 	KASSERT(domain->vmd_segs != 0, ("domain without segments"));
 	vm_pageout_init_marker(&domain->vmd_laundry_marker, PQ_LAUNDRY);
 
 	cycle = 0;
 	gen = 0;
 	shortfall = prev_shortfall = 0;
 	target = 0;
 
 	/*
 	 * The pageout laundry worker is never done, so loop forever.
 	 */
 	for (;;) {
 		KASSERT(cycle >= 0, ("negative cycle %d", cycle));
 		KASSERT(target >= 0, ("negative target %d", target));
 		launder = 0;
 
 		/*
 		 * First determine whether we need to launder pages to meet a
 		 * shortage of free pages.
 		 */
 		if (vm_laundering_needed()) {
 			shortfall = vm_laundry_target() + vm_pageout_deficit;
 
 			/*
 			 * If we're in shortfall and we haven't yet started a
 			 * laundering cycle to get us out of it, begin a run.
 			 * If we're still in shortfall despite a previous
 			 * laundering run, start a new one.
 			 */
 			if (prev_shortfall == 0 || cycle == 0) {
 				target = shortfall;
 				cycle = VM_LAUNDER_RATE;
 			}
 			prev_shortfall = shortfall;
 		}
 		if (prev_shortfall > 0) {
 			/*
 			 * We entered shortfall at some point in the recent
 			 * past.  If we have reached our target, or the
 			 * laundering run is finished and we're not currently in
 			 * shortfall, we have no immediate need to launder
 			 * pages.  Otherwise keep laundering.
 			 */
 			if (vm_laundry_target() <= 0 || cycle == 0) {
 				shortfall = prev_shortfall = target = 0;
 			} else {
 				launder = target / cycle--;
 				goto dolaundry;
 			}
 		}
 
 		/*
 		 * There's no immediate need to launder any pages; see if we
 		 * meet the conditions to perform background laundering:
 		 *
 		 * 1. The ratio of dirty to clean inactive pages exceeds the
 		 *    background laundering threshold and the pagedaemon has
 		 *    recently been woken up, or
 		 * 2. we haven't yet reached the target of the current
 		 *    background laundering run.
 		 */
 		ninact = vm_cnt.v_inactive_count;
 		nlaundry = vm_cnt.v_laundry_count;
 		wakeups = VM_METER_PCPU_CNT(v_pdwakeups);
 		if (target == 0 && ninact > 0 && wakeups != gen &&
 		    nlaundry * bkgrd_launder_ratio >= ninact) {
 			gen = wakeups;
 
 			/*
 			 * The pagedaemon has woken up at least once since the
 			 * last background laundering run and we're above the
 			 * dirty page threshold.  Launder some pages to balance
 			 * the inactive and laundry queues.  We attempt to
 			 * finish within one second.
 			 */
 			cycle = VM_LAUNDER_INTERVAL;
 
 			/*
 			 * Set our target to that of the pagedaemon, scaled by
 			 * the relative lengths of the inactive and laundry
 			 * queues.  Divide by a fudge factor as well: we don't
 			 * want to reclaim dirty pages at the same rate as clean
 			 * pages.
 			 */
 			target = vm_cnt.v_free_target -
 			    vm_pageout_wakeup_thresh;
 			target = nlaundry * (u_int)target / ninact / 10;
 			if (target == 0)
 				target = 1;
 
 			/*
 			 * Make sure we don't exceed the background laundering
 			 * threshold.
 			 */
 			target = min(target, bkgrd_launder_max);
 		}
 		if (target > 0 && cycle != 0)
 			launder = target / cycle--;
 
 dolaundry:
 		if (launder > 0)
 			target -= min(vm_pageout_launder(domain, launder,
 			    shortfall > 0), target);
 
 		tsleep(&vm_cnt.v_laundry_count, PVM, "laundr",
 		    hz / VM_LAUNDER_INTERVAL);
 	}
 }
 
 /*
  *	vm_pageout_scan does the dirty work for the pageout daemon.
  *
  *	pass 0 - Update active LRU/deactivate pages
  *	pass 1 - Free inactive pages
  */
 static void
 vm_pageout_scan(struct vm_domain *vmd, int pass)
 {
 	vm_page_t m, next;
 	struct vm_pagequeue *pq;
 	vm_object_t object;
 	long min_scan;
 	int act_delta, addl_page_shortage, deficit, maxscan;
 	int page_shortage, scan_tick, scanned, starting_page_shortage;
 	boolean_t queue_locked;
 
 	/*
 	 * If we need to reclaim memory ask kernel caches to return
 	 * some.  We rate limit to avoid thrashing.
 	 */
 	if (vmd == &vm_dom[0] && pass > 0 &&
 	    (time_uptime - lowmem_uptime) >= lowmem_period) {
 		/*
 		 * Decrease registered cache sizes.
 		 */
 		SDT_PROBE0(vm, , , vm__lowmem_scan);
 		EVENTHANDLER_INVOKE(vm_lowmem, 0);
 		/*
 		 * We do this explicitly after the caches have been
 		 * drained above.
 		 */
 		uma_reclaim();
 		lowmem_uptime = time_uptime;
 	}
 
 	/*
 	 * The addl_page_shortage is the number of temporarily
 	 * stuck pages in the inactive queue.  In other words, the
 	 * number of pages from the inactive count that should be
 	 * discounted in setting the target for the active queue scan.
 	 */
 	addl_page_shortage = 0;
 
 	/*
 	 * Calculate the number of pages that we want to free.
 	 */
 	if (pass > 0) {
 		deficit = atomic_readandclear_int(&vm_pageout_deficit);
 		page_shortage = vm_paging_target() + deficit;
 	} else
 		page_shortage = deficit = 0;
 	starting_page_shortage = page_shortage;
 
 	/*
 	 * Start scanning the inactive queue for pages that we can free.  The
 	 * scan will stop when we reach the target or we have scanned the
 	 * entire queue.  (Note that m->act_count is not used to make
 	 * decisions for the inactive queue, only for the active queue.)
 	 */
 	pq = &vmd->vmd_pagequeues[PQ_INACTIVE];
 	maxscan = pq->pq_cnt;
 	vm_pagequeue_lock(pq);
 	queue_locked = TRUE;
 	for (m = TAILQ_FIRST(&pq->pq_pl);
 	     m != NULL && maxscan-- > 0 && page_shortage > 0;
 	     m = next) {
 		vm_pagequeue_assert_locked(pq);
 		KASSERT(queue_locked, ("unlocked inactive queue"));
 		KASSERT(vm_page_inactive(m), ("Inactive queue %p", m));
 
 		PCPU_INC(cnt.v_pdpages);
 		next = TAILQ_NEXT(m, plinks.q);
 
 		/*
 		 * skip marker pages
 		 */
 		if (m->flags & PG_MARKER)
 			continue;
 
 		KASSERT((m->flags & PG_FICTITIOUS) == 0,
 		    ("Fictitious page %p cannot be in inactive queue", m));
 		KASSERT((m->oflags & VPO_UNMANAGED) == 0,
 		    ("Unmanaged page %p cannot be in inactive queue", m));
 
 		/*
 		 * The page or object lock acquisitions fail if the
 		 * page was removed from the queue or moved to a
 		 * different position within the queue.  In either
 		 * case, addl_page_shortage should not be incremented.
 		 */
 		if (!vm_pageout_page_lock(m, &next))
 			goto unlock_page;
 		else if (m->hold_count != 0) {
 			/*
 			 * Held pages are essentially stuck in the
 			 * queue.  So, they ought to be discounted
 			 * from the inactive count.  See the
 			 * calculation of the page_shortage for the
 			 * loop over the active queue below.
 			 */
 			addl_page_shortage++;
 			goto unlock_page;
 		}
 		object = m->object;
 		if (!VM_OBJECT_TRYWLOCK(object)) {
 			if (!vm_pageout_fallback_object_lock(m, &next))
 				goto unlock_object;
 			else if (m->hold_count != 0) {
 				addl_page_shortage++;
 				goto unlock_object;
 			}
 		}
 		if (vm_page_busied(m)) {
 			/*
 			 * Don't mess with busy pages.  Leave them at
 			 * the front of the queue.  Most likely, they
 			 * are being paged out and will leave the
 			 * queue shortly after the scan finishes.  So,
 			 * they ought to be discounted from the
 			 * inactive count.
 			 */
 			addl_page_shortage++;
 unlock_object:
 			VM_OBJECT_WUNLOCK(object);
 unlock_page:
 			vm_page_unlock(m);
 			continue;
 		}
 		KASSERT(m->hold_count == 0, ("Held page %p", m));
 
 		/*
 		 * Dequeue the inactive page and unlock the inactive page
 		 * queue, invalidating the 'next' pointer.  Dequeueing the
 		 * page here avoids a later reacquisition (and release) of
 		 * the inactive page queue lock when vm_page_activate(),
 		 * vm_page_free(), or vm_page_launder() is called.  Use a
 		 * marker to remember our place in the inactive queue.
 		 */
 		TAILQ_INSERT_AFTER(&pq->pq_pl, m, &vmd->vmd_marker, plinks.q);
 		vm_page_dequeue_locked(m);
 		vm_pagequeue_unlock(pq);
 		queue_locked = FALSE;
 
 		/*
 		 * Invalid pages can be easily freed. They cannot be
 		 * mapped, vm_page_free() asserts this.
 		 */
 		if (m->valid == 0)
 			goto free_page;
 
 		/*
 		 * If the page has been referenced and the object is not dead,
 		 * reactivate or requeue the page depending on whether the
 		 * object is mapped.
 		 */
 		if ((m->aflags & PGA_REFERENCED) != 0) {
 			vm_page_aflag_clear(m, PGA_REFERENCED);
 			act_delta = 1;
 		} else
 			act_delta = 0;
 		if (object->ref_count != 0) {
 			act_delta += pmap_ts_referenced(m);
 		} else {
 			KASSERT(!pmap_page_is_mapped(m),
 			    ("vm_pageout_scan: page %p is mapped", m));
 		}
 		if (act_delta != 0) {
 			if (object->ref_count != 0) {
 				vm_page_activate(m);
 
 				/*
 				 * Increase the activation count if the page
 				 * was referenced while in the inactive queue.
 				 * This makes it less likely that the page will
 				 * be returned prematurely to the inactive
 				 * queue.
  				 */
 				m->act_count += act_delta + ACT_ADVANCE;
 				goto drop_page;
 			} else if ((object->flags & OBJ_DEAD) == 0) {
 				vm_pagequeue_lock(pq);
 				queue_locked = TRUE;
 				m->queue = PQ_INACTIVE;
 				TAILQ_INSERT_TAIL(&pq->pq_pl, m, plinks.q);
 				vm_pagequeue_cnt_inc(pq);
 				goto drop_page;
 			}
 		}
 
 		/*
 		 * If the page appears to be clean at the machine-independent
 		 * layer, then remove all of its mappings from the pmap in
 		 * anticipation of freeing it.  If, however, any of the page's
 		 * mappings allow write access, then the page may still be
 		 * modified until the last of those mappings are removed.
 		 */
 		if (object->ref_count != 0) {
 			vm_page_test_dirty(m);
 			if (m->dirty == 0)
 				pmap_remove_all(m);
 		}
 
 		/*
 		 * Clean pages can be freed, but dirty pages must be sent back
 		 * to the laundry, unless they belong to a dead object.
 		 * Requeueing dirty pages from dead objects is pointless, as
 		 * they are being paged out and freed by the thread that
 		 * destroyed the object.
 		 */
 		if (m->dirty == 0) {
 free_page:
 			vm_page_free(m);
 			PCPU_INC(cnt.v_dfree);
 			--page_shortage;
 		} else if ((object->flags & OBJ_DEAD) == 0)
 			vm_page_launder(m);
 drop_page:
 		vm_page_unlock(m);
 		VM_OBJECT_WUNLOCK(object);
 		if (!queue_locked) {
 			vm_pagequeue_lock(pq);
 			queue_locked = TRUE;
 		}
 		next = TAILQ_NEXT(&vmd->vmd_marker, plinks.q);
 		TAILQ_REMOVE(&pq->pq_pl, &vmd->vmd_marker, plinks.q);
 	}
 	vm_pagequeue_unlock(pq);
 
 	/*
 	 * Wakeup the laundry thread(s) if we didn't free the targeted number
 	 * of pages.
 	 */
 	if (page_shortage > 0)
 		wakeup(&vm_cnt.v_laundry_count);
 
 #if !defined(NO_SWAPPING)
 	/*
 	 * Wakeup the swapout daemon if we didn't free the targeted number of
 	 * pages.
 	 */
 	if (vm_swap_enabled && page_shortage > 0)
 		vm_req_vmdaemon(VM_SWAP_NORMAL);
 #endif
 
 	/*
 	 * If the inactive queue scan fails repeatedly to meet its
 	 * target, kill the largest process.
 	 */
 	vm_pageout_mightbe_oom(vmd, page_shortage, starting_page_shortage);
 
 	/*
 	 * Compute the number of pages we want to try to move from the
 	 * active queue to either the inactive or laundry queue.
 	 *
 	 * When scanning active pages, we make clean pages count more heavily
 	 * towards the page shortage than dirty pages.  This is because dirty
 	 * pages must be laundered before they can be reused and thus have less
 	 * utility when attempting to quickly alleviate a shortage.  However,
 	 * this weighting also causes the scan to deactivate dirty pages more
 	 * more aggressively, improving the effectiveness of clustering and
 	 * ensuring that they can eventually be reused.
 	 */
 	page_shortage = vm_cnt.v_inactive_target - (vm_cnt.v_inactive_count +
 	    vm_cnt.v_laundry_count / act_scan_laundry_weight) +
 	    vm_paging_target() + deficit + addl_page_shortage;
 	page_shortage *= act_scan_laundry_weight;
 
 	pq = &vmd->vmd_pagequeues[PQ_ACTIVE];
 	vm_pagequeue_lock(pq);
 	maxscan = pq->pq_cnt;
 
 	/*
 	 * If we're just idle polling attempt to visit every
 	 * active page within 'update_period' seconds.
 	 */
 	scan_tick = ticks;
 	if (vm_pageout_update_period != 0) {
 		min_scan = pq->pq_cnt;
 		min_scan *= scan_tick - vmd->vmd_last_active_scan;
 		min_scan /= hz * vm_pageout_update_period;
 	} else
 		min_scan = 0;
 	if (min_scan > 0 || (page_shortage > 0 && maxscan > 0))
 		vmd->vmd_last_active_scan = scan_tick;
 
 	/*
 	 * Scan the active queue for pages that can be deactivated.  Update
 	 * the per-page activity counter and use it to identify deactivation
 	 * candidates.
 	 */
 	for (m = TAILQ_FIRST(&pq->pq_pl), scanned = 0; m != NULL && (scanned <
 	    min_scan || (page_shortage > 0 && scanned < maxscan)); m = next,
 	    scanned++) {
 
 		KASSERT(m->queue == PQ_ACTIVE,
 		    ("vm_pageout_scan: page %p isn't active", m));
 
 		next = TAILQ_NEXT(m, plinks.q);
 		if ((m->flags & PG_MARKER) != 0)
 			continue;
 		KASSERT((m->flags & PG_FICTITIOUS) == 0,
 		    ("Fictitious page %p cannot be in active queue", m));
 		KASSERT((m->oflags & VPO_UNMANAGED) == 0,
 		    ("Unmanaged page %p cannot be in active queue", m));
 		if (!vm_pageout_page_lock(m, &next)) {
 			vm_page_unlock(m);
 			continue;
 		}
 
 		/*
 		 * The count for pagedaemon pages is done after checking the
 		 * page for eligibility...
 		 */
 		PCPU_INC(cnt.v_pdpages);
 
 		/*
 		 * Check to see "how much" the page has been used.
 		 */
 		if ((m->aflags & PGA_REFERENCED) != 0) {
 			vm_page_aflag_clear(m, PGA_REFERENCED);
 			act_delta = 1;
 		} else
 			act_delta = 0;
 
 		/*
 		 * Unlocked object ref count check.  Two races are possible.
 		 * 1) The ref was transitioning to zero and we saw non-zero,
 		 *    the pmap bits will be checked unnecessarily.
 		 * 2) The ref was transitioning to one and we saw zero. 
 		 *    The page lock prevents a new reference to this page so
 		 *    we need not check the reference bits.
 		 */
 		if (m->object->ref_count != 0)
 			act_delta += pmap_ts_referenced(m);
 
 		/*
 		 * Advance or decay the act_count based on recent usage.
 		 */
 		if (act_delta != 0) {
 			m->act_count += ACT_ADVANCE + act_delta;
 			if (m->act_count > ACT_MAX)
 				m->act_count = ACT_MAX;
 		} else
 			m->act_count -= min(m->act_count, ACT_DECLINE);
 
 		/*
 		 * Move this page to the tail of the active, inactive or laundry
 		 * queue depending on usage.
 		 */
 		if (m->act_count == 0) {
 			/* Dequeue to avoid later lock recursion. */
 			vm_page_dequeue_locked(m);
 #if 0
 			/*
 			 * This requires the object write lock.  It might be a
 			 * good idea during a page shortage, but might also
 			 * cause contention with a concurrent attempt to launder
 			 * pages from this object.
 			 */
 			if (m->object->ref_count != 0)
 				vm_page_test_dirty(m);
 #endif
 			/*
 			 * When not short for inactive pages, let dirty pages go
 			 * through the inactive queue before moving to the
 			 * laundry queues.  This gives them some extra time to
 			 * be reactivated, potentially avoiding an expensive
 			 * pageout.  During a page shortage, the inactive queue
 			 * is necessarily small, so we may move dirty pages
 			 * directly to the laundry queue.
 			 */
 			if (page_shortage <= 0)
 				vm_page_deactivate(m);
 			else {
 				if (m->dirty == 0) {
 					vm_page_deactivate(m);
 					page_shortage -=
 					    act_scan_laundry_weight;
 				} else {
 					vm_page_launder(m);
 					page_shortage--;
 				}
 			}
 		} else
 			vm_page_requeue_locked(m);
 		vm_page_unlock(m);
 	}
 	vm_pagequeue_unlock(pq);
 #if !defined(NO_SWAPPING)
 	/*
 	 * Idle process swapout -- run once per second when we are reclaiming
 	 * pages.
 	 */
 	if (vm_swap_idle_enabled && pass > 0) {
 		static long lsec;
 		if (time_second != lsec) {
 			vm_req_vmdaemon(VM_SWAP_IDLE);
 			lsec = time_second;
 		}
 	}
 #endif
 }
 
 static int vm_pageout_oom_vote;
 
 /*
  * The pagedaemon threads randlomly select one to perform the
  * OOM.  Trying to kill processes before all pagedaemons
  * failed to reach free target is premature.
  */
 static void
 vm_pageout_mightbe_oom(struct vm_domain *vmd, int page_shortage,
     int starting_page_shortage)
 {
 	int old_vote;
 
 	if (starting_page_shortage <= 0 || starting_page_shortage !=
 	    page_shortage)
 		vmd->vmd_oom_seq = 0;
 	else
 		vmd->vmd_oom_seq++;
 	if (vmd->vmd_oom_seq < vm_pageout_oom_seq) {
 		if (vmd->vmd_oom) {
 			vmd->vmd_oom = FALSE;
 			atomic_subtract_int(&vm_pageout_oom_vote, 1);
 		}
 		return;
 	}
 
 	/*
 	 * Do not follow the call sequence until OOM condition is
 	 * cleared.
 	 */
 	vmd->vmd_oom_seq = 0;
 
 	if (vmd->vmd_oom)
 		return;
 
 	vmd->vmd_oom = TRUE;
 	old_vote = atomic_fetchadd_int(&vm_pageout_oom_vote, 1);
 	if (old_vote != vm_ndomains - 1)
 		return;
 
 	/*
 	 * The current pagedaemon thread is the last in the quorum to
 	 * start OOM.  Initiate the selection and signaling of the
 	 * victim.
 	 */
 	vm_pageout_oom(VM_OOM_MEM);
 
 	/*
 	 * After one round of OOM terror, recall our vote.  On the
 	 * next pass, current pagedaemon would vote again if the low
 	 * memory condition is still there, due to vmd_oom being
 	 * false.
 	 */
 	vmd->vmd_oom = FALSE;
 	atomic_subtract_int(&vm_pageout_oom_vote, 1);
 }
 
 /*
  * The OOM killer is the page daemon's action of last resort when
  * memory allocation requests have been stalled for a prolonged period
  * of time because it cannot reclaim memory.  This function computes
  * the approximate number of physical pages that could be reclaimed if
  * the specified address space is destroyed.
  *
  * Private, anonymous memory owned by the address space is the
  * principal resource that we expect to recover after an OOM kill.
  * Since the physical pages mapped by the address space's COW entries
  * are typically shared pages, they are unlikely to be released and so
  * they are not counted.
  *
  * To get to the point where the page daemon runs the OOM killer, its
  * efforts to write-back vnode-backed pages may have stalled.  This
  * could be caused by a memory allocation deadlock in the write path
  * that might be resolved by an OOM kill.  Therefore, physical pages
  * belonging to vnode-backed objects are counted, because they might
  * be freed without being written out first if the address space holds
  * the last reference to an unlinked vnode.
  *
  * Similarly, physical pages belonging to OBJT_PHYS objects are
  * counted because the address space might hold the last reference to
  * the object.
  */
 static long
 vm_pageout_oom_pagecount(struct vmspace *vmspace)
 {
 	vm_map_t map;
 	vm_map_entry_t entry;
 	vm_object_t obj;
 	long res;
 
 	map = &vmspace->vm_map;
 	KASSERT(!map->system_map, ("system map"));
 	sx_assert(&map->lock, SA_LOCKED);
 	res = 0;
 	for (entry = map->header.next; entry != &map->header;
 	    entry = entry->next) {
 		if ((entry->eflags & MAP_ENTRY_IS_SUB_MAP) != 0)
 			continue;
 		obj = entry->object.vm_object;
 		if (obj == NULL)
 			continue;
 		if ((entry->eflags & MAP_ENTRY_NEEDS_COPY) != 0 &&
 		    obj->ref_count != 1)
 			continue;
 		switch (obj->type) {
 		case OBJT_DEFAULT:
 		case OBJT_SWAP:
 		case OBJT_PHYS:
 		case OBJT_VNODE:
 			res += obj->resident_page_count;
 			break;
 		}
 	}
 	return (res);
 }
 
 void
 vm_pageout_oom(int shortage)
 {
 	struct proc *p, *bigproc;
 	vm_offset_t size, bigsize;
 	struct thread *td;
 	struct vmspace *vm;
 
 	/*
 	 * We keep the process bigproc locked once we find it to keep anyone
 	 * from messing with it; however, there is a possibility of
 	 * deadlock if process B is bigproc and one of it's child processes
 	 * attempts to propagate a signal to B while we are waiting for A's
 	 * lock while walking this list.  To avoid this, we don't block on
 	 * the process lock but just skip a process if it is already locked.
 	 */
 	bigproc = NULL;
 	bigsize = 0;
 	sx_slock(&allproc_lock);
 	FOREACH_PROC_IN_SYSTEM(p) {
 		int breakout;
 
 		PROC_LOCK(p);
 
 		/*
 		 * If this is a system, protected or killed process, skip it.
 		 */
 		if (p->p_state != PRS_NORMAL || (p->p_flag & (P_INEXEC |
 		    P_PROTECTED | P_SYSTEM | P_WEXIT)) != 0 ||
 		    p->p_pid == 1 || P_KILLED(p) ||
 		    (p->p_pid < 48 && swap_pager_avail != 0)) {
 			PROC_UNLOCK(p);
 			continue;
 		}
 		/*
 		 * If the process is in a non-running type state,
 		 * don't touch it.  Check all the threads individually.
 		 */
 		breakout = 0;
 		FOREACH_THREAD_IN_PROC(p, td) {
 			thread_lock(td);
 			if (!TD_ON_RUNQ(td) &&
 			    !TD_IS_RUNNING(td) &&
 			    !TD_IS_SLEEPING(td) &&
 			    !TD_IS_SUSPENDED(td) &&
 			    !TD_IS_SWAPPED(td)) {
 				thread_unlock(td);
 				breakout = 1;
 				break;
 			}
 			thread_unlock(td);
 		}
 		if (breakout) {
 			PROC_UNLOCK(p);
 			continue;
 		}
 		/*
 		 * get the process size
 		 */
 		vm = vmspace_acquire_ref(p);
 		if (vm == NULL) {
 			PROC_UNLOCK(p);
 			continue;
 		}
 		_PHOLD_LITE(p);
 		PROC_UNLOCK(p);
 		sx_sunlock(&allproc_lock);
 		if (!vm_map_trylock_read(&vm->vm_map)) {
 			vmspace_free(vm);
 			sx_slock(&allproc_lock);
 			PRELE(p);
 			continue;
 		}
 		size = vmspace_swap_count(vm);
 		if (shortage == VM_OOM_MEM)
 			size += vm_pageout_oom_pagecount(vm);
 		vm_map_unlock_read(&vm->vm_map);
 		vmspace_free(vm);
 		sx_slock(&allproc_lock);
 
 		/*
 		 * If this process is bigger than the biggest one,
 		 * remember it.
 		 */
 		if (size > bigsize) {
 			if (bigproc != NULL)
 				PRELE(bigproc);
 			bigproc = p;
 			bigsize = size;
 		} else {
 			PRELE(p);
 		}
 	}
 	sx_sunlock(&allproc_lock);
 	if (bigproc != NULL) {
 		if (vm_panic_on_oom != 0)
 			panic("out of swap space");
 		PROC_LOCK(bigproc);
 		killproc(bigproc, "out of swap space");
 		sched_nice(bigproc, PRIO_MIN);
 		_PRELE(bigproc);
 		PROC_UNLOCK(bigproc);
 		wakeup(&vm_cnt.v_free_count);
 	}
 }
 
 static void
 vm_pageout_worker(void *arg)
 {
 	struct vm_domain *domain;
 	int domidx;
 
 	domidx = (uintptr_t)arg;
 	domain = &vm_dom[domidx];
 
 	/*
 	 * XXXKIB It could be useful to bind pageout daemon threads to
 	 * the cores belonging to the domain, from which vm_page_array
 	 * is allocated.
 	 */
 
 	KASSERT(domain->vmd_segs != 0, ("domain without segments"));
 	domain->vmd_last_active_scan = ticks;
 	vm_pageout_init_marker(&domain->vmd_marker, PQ_INACTIVE);
 	vm_pageout_init_marker(&domain->vmd_inacthead, PQ_INACTIVE);
 	TAILQ_INSERT_HEAD(&domain->vmd_pagequeues[PQ_INACTIVE].pq_pl,
 	    &domain->vmd_inacthead, plinks.q);
 
 	/*
 	 * The pageout daemon worker is never done, so loop forever.
 	 */
 	while (TRUE) {
 		mtx_lock(&vm_page_queue_free_mtx);
 
 		/*
 		 * Generally, after a level >= 1 scan, if there are enough
 		 * free pages to wakeup the waiters, then they are already
 		 * awake.  A call to vm_page_free() during the scan awakened
 		 * them.  However, in the following case, this wakeup serves
 		 * to bound the amount of time that a thread might wait.
 		 * Suppose a thread's call to vm_page_alloc() fails, but
 		 * before that thread calls VM_WAIT, enough pages are freed by
 		 * other threads to alleviate the free page shortage.  The
 		 * thread will, nonetheless, wait until another page is freed
 		 * or this wakeup is performed.
 		 */
 		if (vm_pages_needed && !vm_page_count_min()) {
 			vm_pages_needed = false;
 			wakeup(&vm_cnt.v_free_count);
 		}
 
 		/*
 		 * Do not clear vm_pageout_wanted until we reach our target.
 		 * Otherwise, we may be awakened over and over again, wasting
 		 * CPU time.
 		 */
 		if (vm_pageout_wanted && !vm_paging_needed())
 			vm_pageout_wanted = false;
 
 		/*
 		 * Might the page daemon receive a wakeup call?
 		 */
 		if (vm_pageout_wanted) {
 			/*
 			 * No.  Either vm_pageout_wanted was set by another
 			 * thread during the previous scan, which must have
 			 * been a level 0 scan, or vm_pageout_wanted was
 			 * already set and the scan failed to free enough
 			 * pages.  If we haven't yet performed a level >= 2
 			 * scan (unlimited dirty cleaning), then upgrade the
 			 * level and scan again now.  Otherwise, sleep a bit
 			 * and try again later.
 			 */
 			mtx_unlock(&vm_page_queue_free_mtx);
 			if (domain->vmd_pass > 1)
 				pause("psleep", hz / 2);
 			domain->vmd_pass++;
 		} else {
 			/*
 			 * Yes.  Sleep until pages need to be reclaimed or
 			 * have their reference stats updated.
 			 */
 			if (mtx_sleep(&vm_pageout_wanted,
 			    &vm_page_queue_free_mtx, PDROP | PVM, "psleep",
 			    hz) == 0) {
 				PCPU_INC(cnt.v_pdwakeups);
 				domain->vmd_pass = 1;
 			} else
 				domain->vmd_pass = 0;
 		}
 
 		vm_pageout_scan(domain, domain->vmd_pass);
 	}
 }
 
 /*
  *	vm_pageout_init initialises basic pageout daemon settings.
  */
 static void
 vm_pageout_init(void)
 {
 	/*
 	 * Initialize some paging parameters.
 	 */
 	vm_cnt.v_interrupt_free_min = 2;
 	if (vm_cnt.v_page_count < 2000)
 		vm_pageout_page_count = 8;
 
 	/*
 	 * v_free_reserved needs to include enough for the largest
 	 * swap pager structures plus enough for any pv_entry structs
 	 * when paging. 
 	 */
 	if (vm_cnt.v_page_count > 1024)
 		vm_cnt.v_free_min = 4 + (vm_cnt.v_page_count - 1024) / 200;
 	else
 		vm_cnt.v_free_min = 4;
 	vm_cnt.v_pageout_free_min = (2*MAXBSIZE)/PAGE_SIZE +
 	    vm_cnt.v_interrupt_free_min;
 	vm_cnt.v_free_reserved = vm_pageout_page_count +
 	    vm_cnt.v_pageout_free_min + (vm_cnt.v_page_count / 768);
 	vm_cnt.v_free_severe = vm_cnt.v_free_min / 2;
 	vm_cnt.v_free_target = 4 * vm_cnt.v_free_min + vm_cnt.v_free_reserved;
 	vm_cnt.v_free_min += vm_cnt.v_free_reserved;
 	vm_cnt.v_free_severe += vm_cnt.v_free_reserved;
 	vm_cnt.v_inactive_target = (3 * vm_cnt.v_free_target) / 2;
 	if (vm_cnt.v_inactive_target > vm_cnt.v_free_count / 3)
 		vm_cnt.v_inactive_target = vm_cnt.v_free_count / 3;
 
 	/*
 	 * Set the default wakeup threshold to be 10% above the minimum
 	 * page limit.  This keeps the steady state out of shortfall.
 	 */
 	vm_pageout_wakeup_thresh = (vm_cnt.v_free_min / 10) * 11;
 
 	/*
 	 * Set interval in seconds for active scan.  We want to visit each
 	 * page at least once every ten minutes.  This is to prevent worst
 	 * case paging behaviors with stale active LRU.
 	 */
 	if (vm_pageout_update_period == 0)
 		vm_pageout_update_period = 600;
 
 	/* XXX does not really belong here */
 	if (vm_page_max_wired == 0)
 		vm_page_max_wired = vm_cnt.v_free_count / 3;
 }
 
 /*
  *     vm_pageout is the high level pageout daemon.
  */
 static void
 vm_pageout(void)
 {
 	int error;
 #ifdef VM_NUMA_ALLOC
 	int i;
 #endif
 
 	swap_pager_swap_init();
 	error = kthread_add(vm_pageout_laundry_worker, NULL, curproc, NULL,
 	    0, 0, "laundry: dom0");
 	if (error != 0)
 		panic("starting laundry for domain 0, error %d", error);
 #ifdef VM_NUMA_ALLOC
 	for (i = 1; i < vm_ndomains; i++) {
 		error = kthread_add(vm_pageout_worker, (void *)(uintptr_t)i,
 		    curproc, NULL, 0, 0, "dom%d", i);
 		if (error != 0) {
 			panic("starting pageout for domain %d, error %d\n",
 			    i, error);
 		}
 	}
 #endif
 	error = kthread_add(uma_reclaim_worker, NULL, curproc, NULL,
 	    0, 0, "uma");
 	if (error != 0)
 		panic("starting uma_reclaim helper, error %d\n", error);
 	vm_pageout_worker((void *)(uintptr_t)0);
 }
 
 /*
  * Unless the free page queue lock is held by the caller, this function
  * should be regarded as advisory.  Specifically, the caller should
  * not msleep() on &vm_cnt.v_free_count following this function unless
  * the free page queue lock is held until the msleep() is performed.
  */
 void
 pagedaemon_wakeup(void)
 {
 
 	if (!vm_pageout_wanted && curthread->td_proc != pageproc) {
 		vm_pageout_wanted = true;
 		wakeup(&vm_pageout_wanted);
 	}
 }
 
 #if !defined(NO_SWAPPING)
 static void
 vm_req_vmdaemon(int req)
 {
 	static int lastrun = 0;
 
 	mtx_lock(&vm_daemon_mtx);
 	vm_pageout_req_swapout |= req;
 	if ((ticks > (lastrun + hz)) || (ticks < lastrun)) {
 		wakeup(&vm_daemon_needed);
 		lastrun = ticks;
 	}
 	mtx_unlock(&vm_daemon_mtx);
 }
 
 static void
 vm_daemon(void)
 {
 	struct rlimit rsslim;
 	struct proc *p;
 	struct thread *td;
 	struct vmspace *vm;
 	int breakout, swapout_flags, tryagain, attempts;
 #ifdef RACCT
 	uint64_t rsize, ravailable;
 #endif
 
 	while (TRUE) {
 		mtx_lock(&vm_daemon_mtx);
 		msleep(&vm_daemon_needed, &vm_daemon_mtx, PPAUSE, "psleep",
 #ifdef RACCT
 		    racct_enable ? hz : 0
 #else
 		    0
 #endif
 		);
 		swapout_flags = vm_pageout_req_swapout;
 		vm_pageout_req_swapout = 0;
 		mtx_unlock(&vm_daemon_mtx);
 		if (swapout_flags)
 			swapout_procs(swapout_flags);
 
 		/*
 		 * scan the processes for exceeding their rlimits or if
 		 * process is swapped out -- deactivate pages
 		 */
 		tryagain = 0;
 		attempts = 0;
 again:
 		attempts++;
 		sx_slock(&allproc_lock);
 		FOREACH_PROC_IN_SYSTEM(p) {
 			vm_pindex_t limit, size;
 
 			/*
 			 * if this is a system process or if we have already
 			 * looked at this process, skip it.
 			 */
 			PROC_LOCK(p);
 			if (p->p_state != PRS_NORMAL ||
 			    p->p_flag & (P_INEXEC | P_SYSTEM | P_WEXIT)) {
 				PROC_UNLOCK(p);
 				continue;
 			}
 			/*
 			 * if the process is in a non-running type state,
 			 * don't touch it.
 			 */
 			breakout = 0;
 			FOREACH_THREAD_IN_PROC(p, td) {
 				thread_lock(td);
 				if (!TD_ON_RUNQ(td) &&
 				    !TD_IS_RUNNING(td) &&
 				    !TD_IS_SLEEPING(td) &&
 				    !TD_IS_SUSPENDED(td)) {
 					thread_unlock(td);
 					breakout = 1;
 					break;
 				}
 				thread_unlock(td);
 			}
 			if (breakout) {
 				PROC_UNLOCK(p);
 				continue;
 			}
 			/*
 			 * get a limit
 			 */
 			lim_rlimit_proc(p, RLIMIT_RSS, &rsslim);
 			limit = OFF_TO_IDX(
 			    qmin(rsslim.rlim_cur, rsslim.rlim_max));
 
 			/*
 			 * let processes that are swapped out really be
 			 * swapped out set the limit to nothing (will force a
 			 * swap-out.)
 			 */
 			if ((p->p_flag & P_INMEM) == 0)
 				limit = 0;	/* XXX */
 			vm = vmspace_acquire_ref(p);
 			_PHOLD_LITE(p);
 			PROC_UNLOCK(p);
 			if (vm == NULL) {
 				PRELE(p);
 				continue;
 			}
 			sx_sunlock(&allproc_lock);
 
 			size = vmspace_resident_count(vm);
 			if (size >= limit) {
 				vm_pageout_map_deactivate_pages(
 				    &vm->vm_map, limit);
 			}
 #ifdef RACCT
 			if (racct_enable) {
 				rsize = IDX_TO_OFF(size);
 				PROC_LOCK(p);
 				racct_set(p, RACCT_RSS, rsize);
 				ravailable = racct_get_available(p, RACCT_RSS);
 				PROC_UNLOCK(p);
 				if (rsize > ravailable) {
 					/*
 					 * Don't be overly aggressive; this
 					 * might be an innocent process,
 					 * and the limit could've been exceeded
 					 * by some memory hog.  Don't try
 					 * to deactivate more than 1/4th
 					 * of process' resident set size.
 					 */
 					if (attempts <= 8) {
 						if (ravailable < rsize -
 						    (rsize / 4)) {
 							ravailable = rsize -
 							    (rsize / 4);
 						}
 					}
 					vm_pageout_map_deactivate_pages(
 					    &vm->vm_map,
 					    OFF_TO_IDX(ravailable));
 					/* Update RSS usage after paging out. */
 					size = vmspace_resident_count(vm);
 					rsize = IDX_TO_OFF(size);
 					PROC_LOCK(p);
 					racct_set(p, RACCT_RSS, rsize);
 					PROC_UNLOCK(p);
 					if (rsize > ravailable)
 						tryagain = 1;
 				}
 			}
 #endif
 			vmspace_free(vm);
 			sx_slock(&allproc_lock);
 			PRELE(p);
 		}
 		sx_sunlock(&allproc_lock);
 		if (tryagain != 0 && attempts <= 10)
 			goto again;
 	}
 }
 #endif			/* !defined(NO_SWAPPING) */
Index: user/alc/PQ_LAUNDRY/sys/x86/iommu/intel_drv.c
===================================================================
--- user/alc/PQ_LAUNDRY/sys/x86/iommu/intel_drv.c	(revision 303774)
+++ user/alc/PQ_LAUNDRY/sys/x86/iommu/intel_drv.c	(revision 303775)
@@ -1,1288 +1,1288 @@
 /*-
  * Copyright (c) 2013-2015 The FreeBSD Foundation
  * All rights reserved.
  *
  * This software was developed by Konstantin Belousov <kib@FreeBSD.org>
  * under sponsorship from the FreeBSD Foundation.
  *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
  * 1. Redistributions of source code must retain the above copyright
  *    notice, this list of conditions and the following disclaimer.
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
  *
  * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  * SUCH DAMAGE.
  */
 
 #include <sys/cdefs.h>
 __FBSDID("$FreeBSD$");
 
 #include "opt_acpi.h"
 #if defined(__amd64__)
 #define	DEV_APIC
 #else
 #include "opt_apic.h"
 #endif
 #include "opt_ddb.h"
 
 #include <sys/param.h>
 #include <sys/bus.h>
 #include <sys/kernel.h>
 #include <sys/lock.h>
 #include <sys/malloc.h>
 #include <sys/memdesc.h>
 #include <sys/module.h>
 #include <sys/rman.h>
 #include <sys/rwlock.h>
 #include <sys/smp.h>
 #include <sys/taskqueue.h>
 #include <sys/tree.h>
 #include <sys/vmem.h>
 #include <machine/bus.h>
 #include <contrib/dev/acpica/include/acpi.h>
 #include <contrib/dev/acpica/include/accommon.h>
 #include <dev/acpica/acpivar.h>
 #include <vm/vm.h>
 #include <vm/vm_extern.h>
 #include <vm/vm_kern.h>
 #include <vm/vm_object.h>
 #include <vm/vm_page.h>
 #include <vm/vm_pager.h>
 #include <vm/vm_map.h>
 #include <x86/include/busdma_impl.h>
 #include <x86/iommu/intel_reg.h>
 #include <x86/iommu/busdma_dmar.h>
 #include <x86/iommu/intel_dmar.h>
 #include <dev/pci/pcireg.h>
 #include <dev/pci/pcivar.h>
 
 #ifdef DEV_APIC
 #include "pcib_if.h"
 #endif
 
 #define	DMAR_FAULT_IRQ_RID	0
 #define	DMAR_QI_IRQ_RID		1
 #define	DMAR_REG_RID		2
 
 static devclass_t dmar_devclass;
 static device_t *dmar_devs;
 static int dmar_devcnt;
 
 typedef int (*dmar_iter_t)(ACPI_DMAR_HEADER *, void *);
 
 static void
 dmar_iterate_tbl(dmar_iter_t iter, void *arg)
 {
 	ACPI_TABLE_DMAR *dmartbl;
 	ACPI_DMAR_HEADER *dmarh;
 	char *ptr, *ptrend;
 	ACPI_STATUS status;
 
 	status = AcpiGetTable(ACPI_SIG_DMAR, 1, (ACPI_TABLE_HEADER **)&dmartbl);
 	if (ACPI_FAILURE(status))
 		return;
 	ptr = (char *)dmartbl + sizeof(*dmartbl);
 	ptrend = (char *)dmartbl + dmartbl->Header.Length;
 	for (;;) {
 		if (ptr >= ptrend)
 			break;
 		dmarh = (ACPI_DMAR_HEADER *)ptr;
 		if (dmarh->Length <= 0) {
 			printf("dmar_identify: corrupted DMAR table, l %d\n",
 			    dmarh->Length);
 			break;
 		}
 		ptr += dmarh->Length;
 		if (!iter(dmarh, arg))
 			break;
 	}
 }
 
 struct find_iter_args {
 	int i;
 	ACPI_DMAR_HARDWARE_UNIT *res;
 };
 
 static int
 dmar_find_iter(ACPI_DMAR_HEADER *dmarh, void *arg)
 {
 	struct find_iter_args *fia;
 
 	if (dmarh->Type != ACPI_DMAR_TYPE_HARDWARE_UNIT)
 		return (1);
 
 	fia = arg;
 	if (fia->i == 0) {
 		fia->res = (ACPI_DMAR_HARDWARE_UNIT *)dmarh;
 		return (0);
 	}
 	fia->i--;
 	return (1);
 }
 
 static ACPI_DMAR_HARDWARE_UNIT *
 dmar_find_by_index(int idx)
 {
 	struct find_iter_args fia;
 
 	fia.i = idx;
 	fia.res = NULL;
 	dmar_iterate_tbl(dmar_find_iter, &fia);
 	return (fia.res);
 }
 
 static int
 dmar_count_iter(ACPI_DMAR_HEADER *dmarh, void *arg)
 {
 
 	if (dmarh->Type == ACPI_DMAR_TYPE_HARDWARE_UNIT)
 		dmar_devcnt++;
 	return (1);
 }
 
 static int dmar_enable = 0;
 static void
 dmar_identify(driver_t *driver, device_t parent)
 {
 	ACPI_TABLE_DMAR *dmartbl;
 	ACPI_DMAR_HARDWARE_UNIT *dmarh;
 	ACPI_STATUS status;
 	int i, error;
 
 	if (acpi_disabled("dmar"))
 		return;
 	TUNABLE_INT_FETCH("hw.dmar.enable", &dmar_enable);
 	if (!dmar_enable)
 		return;
 #ifdef INVARIANTS
 	TUNABLE_INT_FETCH("hw.dmar.check_free", &dmar_check_free);
 #endif
 	TUNABLE_INT_FETCH("hw.dmar.match_verbose", &dmar_match_verbose);
 	status = AcpiGetTable(ACPI_SIG_DMAR, 1, (ACPI_TABLE_HEADER **)&dmartbl);
 	if (ACPI_FAILURE(status))
 		return;
 	haw = dmartbl->Width + 1;
 	if ((1ULL << (haw + 1)) > BUS_SPACE_MAXADDR)
 		dmar_high = BUS_SPACE_MAXADDR;
 	else
 		dmar_high = 1ULL << (haw + 1);
 	if (bootverbose) {
 		printf("DMAR HAW=%d flags=<%b>\n", dmartbl->Width,
 		    (unsigned)dmartbl->Flags,
 		    "\020\001INTR_REMAP\002X2APIC_OPT_OUT");
 	}
 
 	dmar_iterate_tbl(dmar_count_iter, NULL);
 	if (dmar_devcnt == 0)
 		return;
 	dmar_devs = malloc(sizeof(device_t) * dmar_devcnt, M_DEVBUF,
 	    M_WAITOK | M_ZERO);
 	for (i = 0; i < dmar_devcnt; i++) {
 		dmarh = dmar_find_by_index(i);
 		if (dmarh == NULL) {
 			printf("dmar_identify: cannot find HWUNIT %d\n", i);
 			continue;
 		}
 		dmar_devs[i] = BUS_ADD_CHILD(parent, 1, "dmar", i);
 		if (dmar_devs[i] == NULL) {
 			printf("dmar_identify: cannot create instance %d\n", i);
 			continue;
 		}
 		error = bus_set_resource(dmar_devs[i], SYS_RES_MEMORY,
 		    DMAR_REG_RID, dmarh->Address, PAGE_SIZE);
 		if (error != 0) {
 			printf(
 	"dmar%d: unable to alloc register window at 0x%08jx: error %d\n",
 			    i, (uintmax_t)dmarh->Address, error);
 			device_delete_child(parent, dmar_devs[i]);
 			dmar_devs[i] = NULL;
 		}
 	}
 }
 
 static int
 dmar_probe(device_t dev)
 {
 
 	if (acpi_get_handle(dev) != NULL)
 		return (ENXIO);
 	device_set_desc(dev, "DMA remap");
 	return (BUS_PROBE_NOWILDCARD);
 }
 
 static void
 dmar_release_intr(device_t dev, struct dmar_unit *unit, int idx)
 {
 	struct dmar_msi_data *dmd;
 
 	dmd = &unit->intrs[idx];
 	if (dmd->irq == -1)
 		return;
 	bus_teardown_intr(dev, dmd->irq_res, dmd->intr_handle);
 	bus_release_resource(dev, SYS_RES_IRQ, dmd->irq_rid, dmd->irq_res);
 	bus_delete_resource(dev, SYS_RES_IRQ, dmd->irq_rid);
 	PCIB_RELEASE_MSIX(device_get_parent(device_get_parent(dev)),
 	    dev, dmd->irq);
 	dmd->irq = -1;
 }
 
 static void
 dmar_release_resources(device_t dev, struct dmar_unit *unit)
 {
 	int i;
 
 	dmar_fini_busdma(unit);
 	dmar_fini_irt(unit);
 	dmar_fini_qi(unit);
 	dmar_fini_fault_log(unit);
 	for (i = 0; i < DMAR_INTR_TOTAL; i++)
 		dmar_release_intr(dev, unit, i);
 	if (unit->regs != NULL) {
 		bus_deactivate_resource(dev, SYS_RES_MEMORY, unit->reg_rid,
 		    unit->regs);
 		bus_release_resource(dev, SYS_RES_MEMORY, unit->reg_rid,
 		    unit->regs);
 		unit->regs = NULL;
 	}
 	if (unit->domids != NULL) {
 		delete_unrhdr(unit->domids);
 		unit->domids = NULL;
 	}
 	if (unit->ctx_obj != NULL) {
 		vm_object_deallocate(unit->ctx_obj);
 		unit->ctx_obj = NULL;
 	}
 }
 
 static int
 dmar_alloc_irq(device_t dev, struct dmar_unit *unit, int idx)
 {
 	device_t pcib;
 	struct dmar_msi_data *dmd;
 	uint64_t msi_addr;
 	uint32_t msi_data;
 	int error;
 
 	dmd = &unit->intrs[idx];
 	pcib = device_get_parent(device_get_parent(dev)); /* Really not pcib */
 	error = PCIB_ALLOC_MSIX(pcib, dev, &dmd->irq);
 	if (error != 0) {
 		device_printf(dev, "cannot allocate %s interrupt, %d\n",
 		    dmd->name, error);
 		goto err1;
 	}
 	error = bus_set_resource(dev, SYS_RES_IRQ, dmd->irq_rid,
 	    dmd->irq, 1);
 	if (error != 0) {
 		device_printf(dev, "cannot set %s interrupt resource, %d\n",
 		    dmd->name, error);
 		goto err2;
 	}
 	dmd->irq_res = bus_alloc_resource_any(dev, SYS_RES_IRQ,
 	    &dmd->irq_rid, RF_ACTIVE);
 	if (dmd->irq_res == NULL) {
 		device_printf(dev,
 		    "cannot allocate resource for %s interrupt\n", dmd->name);
 		error = ENXIO;
 		goto err3;
 	}
 	error = bus_setup_intr(dev, dmd->irq_res, INTR_TYPE_MISC,
 	    dmd->handler, NULL, unit, &dmd->intr_handle);
 	if (error != 0) {
 		device_printf(dev, "cannot setup %s interrupt, %d\n",
 		    dmd->name, error);
 		goto err4;
 	}
-	bus_describe_intr(dev, dmd->irq_res, dmd->intr_handle, dmd->name);
+	bus_describe_intr(dev, dmd->irq_res, dmd->intr_handle, "%s", dmd->name);
 	error = PCIB_MAP_MSI(pcib, dev, dmd->irq, &msi_addr, &msi_data);
 	if (error != 0) {
 		device_printf(dev, "cannot map %s interrupt, %d\n",
 		    dmd->name, error);
 		goto err5;
 	}
 	dmar_write4(unit, dmd->msi_data_reg, msi_data);
 	dmar_write4(unit, dmd->msi_addr_reg, msi_addr);
 	/* Only for xAPIC mode */
 	dmar_write4(unit, dmd->msi_uaddr_reg, msi_addr >> 32);
 	return (0);
 
 err5:
 	bus_teardown_intr(dev, dmd->irq_res, dmd->intr_handle);
 err4:
 	bus_release_resource(dev, SYS_RES_IRQ, dmd->irq_rid, dmd->irq_res);
 err3:
 	bus_delete_resource(dev, SYS_RES_IRQ, dmd->irq_rid);
 err2:
 	PCIB_RELEASE_MSIX(pcib, dev, dmd->irq);
 	dmd->irq = -1;
 err1:
 	return (error);
 }
 
 #ifdef DEV_APIC
 static int
 dmar_remap_intr(device_t dev, device_t child, u_int irq)
 {
 	struct dmar_unit *unit;
 	struct dmar_msi_data *dmd;
 	uint64_t msi_addr;
 	uint32_t msi_data;
 	int i, error;
 
 	unit = device_get_softc(dev);
 	for (i = 0; i < DMAR_INTR_TOTAL; i++) {
 		dmd = &unit->intrs[i];
 		if (irq == dmd->irq) {
 			error = PCIB_MAP_MSI(device_get_parent(
 			    device_get_parent(dev)),
 			    dev, irq, &msi_addr, &msi_data);
 			if (error != 0)
 				return (error);
 			DMAR_LOCK(unit);
 			(dmd->disable_intr)(unit);
 			dmar_write4(unit, dmd->msi_data_reg, msi_data);
 			dmar_write4(unit, dmd->msi_addr_reg, msi_addr);
 			dmar_write4(unit, dmd->msi_uaddr_reg, msi_addr >> 32);
 			(dmd->enable_intr)(unit);
 			DMAR_UNLOCK(unit);
 			return (0);
 		}
 	}
 	return (ENOENT);
 }
 #endif
 
 static void
 dmar_print_caps(device_t dev, struct dmar_unit *unit,
     ACPI_DMAR_HARDWARE_UNIT *dmaru)
 {
 	uint32_t caphi, ecaphi;
 
 	device_printf(dev, "regs@0x%08jx, ver=%d.%d, seg=%d, flags=<%b>\n",
 	    (uintmax_t)dmaru->Address, DMAR_MAJOR_VER(unit->hw_ver),
 	    DMAR_MINOR_VER(unit->hw_ver), dmaru->Segment,
 	    dmaru->Flags, "\020\001INCLUDE_ALL_PCI");
 	caphi = unit->hw_cap >> 32;
 	device_printf(dev, "cap=%b,", (u_int)unit->hw_cap,
 	    "\020\004AFL\005WBF\006PLMR\007PHMR\010CM\027ZLR\030ISOCH");
 	printf("%b, ", caphi, "\020\010PSI\027DWD\030DRD\031FL1GP\034PSI");
 	printf("ndoms=%d, sagaw=%d, mgaw=%d, fro=%d, nfr=%d, superp=%d",
 	    DMAR_CAP_ND(unit->hw_cap), DMAR_CAP_SAGAW(unit->hw_cap),
 	    DMAR_CAP_MGAW(unit->hw_cap), DMAR_CAP_FRO(unit->hw_cap),
 	    DMAR_CAP_NFR(unit->hw_cap), DMAR_CAP_SPS(unit->hw_cap));
 	if ((unit->hw_cap & DMAR_CAP_PSI) != 0)
 		printf(", mamv=%d", DMAR_CAP_MAMV(unit->hw_cap));
 	printf("\n");
 	ecaphi = unit->hw_ecap >> 32;
 	device_printf(dev, "ecap=%b,", (u_int)unit->hw_ecap,
 	    "\020\001C\002QI\003DI\004IR\005EIM\007PT\010SC\031ECS\032MTS"
 	    "\033NEST\034DIS\035PASID\036PRS\037ERS\040SRS");
 	printf("%b, ", ecaphi, "\020\002NWFS\003EAFS");
 	printf("mhmw=%d, iro=%d\n", DMAR_ECAP_MHMV(unit->hw_ecap),
 	    DMAR_ECAP_IRO(unit->hw_ecap));
 }
 
 static int
 dmar_attach(device_t dev)
 {
 	struct dmar_unit *unit;
 	ACPI_DMAR_HARDWARE_UNIT *dmaru;
 	int i, error;
 
 	unit = device_get_softc(dev);
 	unit->dev = dev;
 	unit->unit = device_get_unit(dev);
 	dmaru = dmar_find_by_index(unit->unit);
 	if (dmaru == NULL)
 		return (EINVAL);
 	unit->segment = dmaru->Segment;
 	unit->base = dmaru->Address;
 	unit->reg_rid = DMAR_REG_RID;
 	unit->regs = bus_alloc_resource_any(dev, SYS_RES_MEMORY,
 	    &unit->reg_rid, RF_ACTIVE);
 	if (unit->regs == NULL) {
 		device_printf(dev, "cannot allocate register window\n");
 		return (ENOMEM);
 	}
 	unit->hw_ver = dmar_read4(unit, DMAR_VER_REG);
 	unit->hw_cap = dmar_read8(unit, DMAR_CAP_REG);
 	unit->hw_ecap = dmar_read8(unit, DMAR_ECAP_REG);
 	if (bootverbose)
 		dmar_print_caps(dev, unit, dmaru);
 	dmar_quirks_post_ident(unit);
 
 	for (i = 0; i < DMAR_INTR_TOTAL; i++)
 		unit->intrs[i].irq = -1;
 
 	unit->intrs[DMAR_INTR_FAULT].name = "fault";
 	unit->intrs[DMAR_INTR_FAULT].irq_rid = DMAR_FAULT_IRQ_RID;
 	unit->intrs[DMAR_INTR_FAULT].handler = dmar_fault_intr;
 	unit->intrs[DMAR_INTR_FAULT].msi_data_reg = DMAR_FEDATA_REG;
 	unit->intrs[DMAR_INTR_FAULT].msi_addr_reg = DMAR_FEADDR_REG;
 	unit->intrs[DMAR_INTR_FAULT].msi_uaddr_reg = DMAR_FEUADDR_REG;
 	unit->intrs[DMAR_INTR_FAULT].enable_intr = dmar_enable_fault_intr;
 	unit->intrs[DMAR_INTR_FAULT].disable_intr = dmar_disable_fault_intr;
 	error = dmar_alloc_irq(dev, unit, DMAR_INTR_FAULT);
 	if (error != 0) {
 		dmar_release_resources(dev, unit);
 		return (error);
 	}
 	if (DMAR_HAS_QI(unit)) {
 		unit->intrs[DMAR_INTR_QI].name = "qi";
 		unit->intrs[DMAR_INTR_QI].irq_rid = DMAR_QI_IRQ_RID;
 		unit->intrs[DMAR_INTR_QI].handler = dmar_qi_intr;
 		unit->intrs[DMAR_INTR_QI].msi_data_reg = DMAR_IEDATA_REG;
 		unit->intrs[DMAR_INTR_QI].msi_addr_reg = DMAR_IEADDR_REG;
 		unit->intrs[DMAR_INTR_QI].msi_uaddr_reg = DMAR_IEUADDR_REG;
 		unit->intrs[DMAR_INTR_QI].enable_intr = dmar_enable_qi_intr;
 		unit->intrs[DMAR_INTR_QI].disable_intr = dmar_disable_qi_intr;
 		error = dmar_alloc_irq(dev, unit, DMAR_INTR_QI);
 		if (error != 0) {
 			dmar_release_resources(dev, unit);
 			return (error);
 		}
 	}
 
 	mtx_init(&unit->lock, "dmarhw", NULL, MTX_DEF);
 	unit->domids = new_unrhdr(0, dmar_nd2mask(DMAR_CAP_ND(unit->hw_cap)),
 	    &unit->lock);
 	LIST_INIT(&unit->domains);
 
 	/*
 	 * 9.2 "Context Entry":
 	 * When Caching Mode (CM) field is reported as Set, the
 	 * domain-id value of zero is architecturally reserved.
 	 * Software must not use domain-id value of zero
 	 * when CM is Set.
 	 */
 	if ((unit->hw_cap & DMAR_CAP_CM) != 0)
 		alloc_unr_specific(unit->domids, 0);
 
 	unit->ctx_obj = vm_pager_allocate(OBJT_PHYS, NULL, IDX_TO_OFF(1 +
 	    DMAR_CTX_CNT), 0, 0, NULL);
 
 	/*
 	 * Allocate and load the root entry table pointer.  Enable the
 	 * address translation after the required invalidations are
 	 * done.
 	 */
 	dmar_pgalloc(unit->ctx_obj, 0, DMAR_PGF_WAITOK | DMAR_PGF_ZERO);
 	DMAR_LOCK(unit);
 	error = dmar_load_root_entry_ptr(unit);
 	if (error != 0) {
 		DMAR_UNLOCK(unit);
 		dmar_release_resources(dev, unit);
 		return (error);
 	}
 	error = dmar_inv_ctx_glob(unit);
 	if (error != 0) {
 		DMAR_UNLOCK(unit);
 		dmar_release_resources(dev, unit);
 		return (error);
 	}
 	if ((unit->hw_ecap & DMAR_ECAP_DI) != 0) {
 		error = dmar_inv_iotlb_glob(unit);
 		if (error != 0) {
 			DMAR_UNLOCK(unit);
 			dmar_release_resources(dev, unit);
 			return (error);
 		}
 	}
 
 	DMAR_UNLOCK(unit);
 	error = dmar_init_fault_log(unit);
 	if (error != 0) {
 		dmar_release_resources(dev, unit);
 		return (error);
 	}
 	error = dmar_init_qi(unit);
 	if (error != 0) {
 		dmar_release_resources(dev, unit);
 		return (error);
 	}
 	error = dmar_init_irt(unit);
 	if (error != 0) {
 		dmar_release_resources(dev, unit);
 		return (error);
 	}
 	error = dmar_init_busdma(unit);
 	if (error != 0) {
 		dmar_release_resources(dev, unit);
 		return (error);
 	}
 
 #ifdef NOTYET
 	DMAR_LOCK(unit);
 	error = dmar_enable_translation(unit);
 	if (error != 0) {
 		DMAR_UNLOCK(unit);
 		dmar_release_resources(dev, unit);
 		return (error);
 	}
 	DMAR_UNLOCK(unit);
 #endif
 
 	return (0);
 }
 
 static int
 dmar_detach(device_t dev)
 {
 
 	return (EBUSY);
 }
 
 static int
 dmar_suspend(device_t dev)
 {
 
 	return (0);
 }
 
 static int
 dmar_resume(device_t dev)
 {
 
 	/* XXXKIB */
 	return (0);
 }
 
 static device_method_t dmar_methods[] = {
 	DEVMETHOD(device_identify, dmar_identify),
 	DEVMETHOD(device_probe, dmar_probe),
 	DEVMETHOD(device_attach, dmar_attach),
 	DEVMETHOD(device_detach, dmar_detach),
 	DEVMETHOD(device_suspend, dmar_suspend),
 	DEVMETHOD(device_resume, dmar_resume),
 #ifdef DEV_APIC
 	DEVMETHOD(bus_remap_intr, dmar_remap_intr),
 #endif
 	DEVMETHOD_END
 };
 
 static driver_t	dmar_driver = {
 	"dmar",
 	dmar_methods,
 	sizeof(struct dmar_unit),
 };
 
 DRIVER_MODULE(dmar, acpi, dmar_driver, dmar_devclass, 0, 0);
 MODULE_DEPEND(dmar, acpi, 1, 1, 1);
 
 static void
 dmar_print_path(device_t dev, const char *banner, int busno, int depth,
     const ACPI_DMAR_PCI_PATH *path)
 {
 	int i;
 
 	device_printf(dev, "%s [%d, ", banner, busno);
 	for (i = 0; i < depth; i++) {
 		if (i != 0)
 			printf(", ");
 		printf("(%d, %d)", path[i].Device, path[i].Function);
 	}
 	printf("]\n");
 }
 
 static int
 dmar_dev_depth(device_t child)
 {
 	devclass_t pci_class;
 	device_t bus, pcib;
 	int depth;
 
 	pci_class = devclass_find("pci");
 	for (depth = 1; ; depth++) {
 		bus = device_get_parent(child);
 		pcib = device_get_parent(bus);
 		if (device_get_devclass(device_get_parent(pcib)) !=
 		    pci_class)
 			return (depth);
 		child = pcib;
 	}
 }
 
 static void
 dmar_dev_path(device_t child, int *busno, ACPI_DMAR_PCI_PATH *path, int depth)
 {
 	devclass_t pci_class;
 	device_t bus, pcib;
 
 	pci_class = devclass_find("pci");
 	for (depth--; depth != -1; depth--) {
 		path[depth].Device = pci_get_slot(child);
 		path[depth].Function = pci_get_function(child);
 		bus = device_get_parent(child);
 		pcib = device_get_parent(bus);
 		if (device_get_devclass(device_get_parent(pcib)) !=
 		    pci_class) {
 			/* reached a host bridge */
 			*busno = pcib_get_bus(bus);
 			return;
 		}
 		child = pcib;
 	}
 	panic("wrong depth");
 }
 
 static int
 dmar_match_pathes(int busno1, const ACPI_DMAR_PCI_PATH *path1, int depth1,
     int busno2, const ACPI_DMAR_PCI_PATH *path2, int depth2,
     enum AcpiDmarScopeType scope_type)
 {
 	int i, depth;
 
 	if (busno1 != busno2)
 		return (0);
 	if (scope_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT && depth1 != depth2)
 		return (0);
 	depth = depth1;
 	if (depth2 < depth)
 		depth = depth2;
 	for (i = 0; i < depth; i++) {
 		if (path1[i].Device != path2[i].Device ||
 		    path1[i].Function != path2[i].Function)
 			return (0);
 	}
 	return (1);
 }
 
 static int
 dmar_match_devscope(ACPI_DMAR_DEVICE_SCOPE *devscope, device_t dev,
     int dev_busno, const ACPI_DMAR_PCI_PATH *dev_path, int dev_path_len)
 {
 	ACPI_DMAR_PCI_PATH *path;
 	int path_len;
 
 	if (devscope->Length < sizeof(*devscope)) {
 		printf("dmar_find: corrupted DMAR table, dl %d\n",
 		    devscope->Length);
 		return (-1);
 	}
 	if (devscope->EntryType != ACPI_DMAR_SCOPE_TYPE_ENDPOINT &&
 	    devscope->EntryType != ACPI_DMAR_SCOPE_TYPE_BRIDGE)
 		return (0);
 	path_len = devscope->Length - sizeof(*devscope);
 	if (path_len % 2 != 0) {
 		printf("dmar_find_bsf: corrupted DMAR table, dl %d\n",
 		    devscope->Length);
 		return (-1);
 	}
 	path_len /= 2;
 	path = (ACPI_DMAR_PCI_PATH *)(devscope + 1);
 	if (path_len == 0) {
 		printf("dmar_find: corrupted DMAR table, dl %d\n",
 		    devscope->Length);
 		return (-1);
 	}
 	if (dmar_match_verbose)
 		dmar_print_path(dev, "DMAR", devscope->Bus, path_len, path);
 
 	return (dmar_match_pathes(devscope->Bus, path, path_len, dev_busno,
 	    dev_path, dev_path_len, devscope->EntryType));
 }
 
 struct dmar_unit *
 dmar_find(device_t dev)
 {
 	device_t dmar_dev;
 	ACPI_DMAR_HARDWARE_UNIT *dmarh;
 	ACPI_DMAR_DEVICE_SCOPE *devscope;
 	char *ptr, *ptrend;
 	int i, match, dev_domain, dev_busno, dev_path_len;
 
 	dmar_dev = NULL;
 	dev_domain = pci_get_domain(dev);
 	dev_path_len = dmar_dev_depth(dev);
 	ACPI_DMAR_PCI_PATH dev_path[dev_path_len];
 	dmar_dev_path(dev, &dev_busno, dev_path, dev_path_len);
 	if (dmar_match_verbose)
 		dmar_print_path(dev, "PCI", dev_busno, dev_path_len, dev_path);
 
 	for (i = 0; i < dmar_devcnt; i++) {
 		if (dmar_devs[i] == NULL)
 			continue;
 		dmarh = dmar_find_by_index(i);
 		if (dmarh == NULL)
 			continue;
 		if (dmarh->Segment != dev_domain)
 			continue;
 		if ((dmarh->Flags & ACPI_DMAR_INCLUDE_ALL) != 0) {
 			dmar_dev = dmar_devs[i];
 			if (dmar_match_verbose) {
 				device_printf(dev,
 				    "pci%d:%d:%d:%d matched dmar%d INCLUDE_ALL\n",
 				    dev_domain, pci_get_bus(dev),
 				    pci_get_slot(dev),
 				    pci_get_function(dev),
 				    ((struct dmar_unit *)device_get_softc(
 				    dmar_dev))->unit);
 			}
 			goto found;
 		}
 		ptr = (char *)dmarh + sizeof(*dmarh);
 		ptrend = (char *)dmarh + dmarh->Header.Length;
 		for (;;) {
 			if (ptr >= ptrend)
 				break;
 			devscope = (ACPI_DMAR_DEVICE_SCOPE *)ptr;
 			ptr += devscope->Length;
 			if (dmar_match_verbose) {
 				device_printf(dev,
 				    "pci%d:%d:%d:%d matching dmar%d\n",
 				    dev_domain, pci_get_bus(dev),
 				    pci_get_slot(dev),
 				    pci_get_function(dev),
 				    ((struct dmar_unit *)device_get_softc(
 				    dmar_devs[i]))->unit);
 			}
 			match = dmar_match_devscope(devscope, dev, dev_busno,
 			    dev_path, dev_path_len);
 			if (dmar_match_verbose) {
 				if (match == -1)
 					printf("table error\n");
 				else if (match == 0)
 					printf("not matched\n");
 				else
 					printf("matched\n");
 			}
 			if (match == -1)
 				return (NULL);
 			else if (match == 1) {
 				dmar_dev = dmar_devs[i];
 				goto found;
 			}
 		}
 	}
 	return (NULL);
 found:
 	return (device_get_softc(dmar_dev));
 }
 
 static struct dmar_unit *
 dmar_find_nonpci(u_int id, u_int entry_type, uint16_t *rid)
 {
 	device_t dmar_dev;
 	struct dmar_unit *unit;
 	ACPI_DMAR_HARDWARE_UNIT *dmarh;
 	ACPI_DMAR_DEVICE_SCOPE *devscope;
 	ACPI_DMAR_PCI_PATH *path;
 	char *ptr, *ptrend;
 	int i;
 
 	for (i = 0; i < dmar_devcnt; i++) {
 		dmar_dev = dmar_devs[i];
 		if (dmar_dev == NULL)
 			continue;
 		unit = (struct dmar_unit *)device_get_softc(dmar_dev);
 		dmarh = dmar_find_by_index(i);
 		if (dmarh == NULL)
 			continue;
 		ptr = (char *)dmarh + sizeof(*dmarh);
 		ptrend = (char *)dmarh + dmarh->Header.Length;
 		for (;;) {
 			if (ptr >= ptrend)
 				break;
 			devscope = (ACPI_DMAR_DEVICE_SCOPE *)ptr;
 			ptr += devscope->Length;
 			if (devscope->EntryType != entry_type)
 				continue;
 			if (devscope->EnumerationId != id)
 				continue;
 			if (devscope->Length - sizeof(ACPI_DMAR_DEVICE_SCOPE)
 			    == 2) {
 				if (rid != NULL) {
 					path = (ACPI_DMAR_PCI_PATH *)
 					    (devscope + 1);
 					*rid = PCI_RID(devscope->Bus,
 					    path->Device, path->Function);
 				}
 				return (unit);
 			} else {
 				/* XXXKIB */
 				printf(
 		       "dmar_find_nonpci: id %d type %d path length != 2\n",
 				    id, entry_type);
 			}
 		}
 	}
 	return (NULL);
 }
 
 
 struct dmar_unit *
 dmar_find_hpet(device_t dev, uint16_t *rid)
 {
 
 	return (dmar_find_nonpci(hpet_get_uid(dev), ACPI_DMAR_SCOPE_TYPE_HPET,
 	    rid));
 }
 
 struct dmar_unit *
 dmar_find_ioapic(u_int apic_id, uint16_t *rid)
 {
 
 	return (dmar_find_nonpci(apic_id, ACPI_DMAR_SCOPE_TYPE_IOAPIC, rid));
 }
 
 struct rmrr_iter_args {
 	struct dmar_domain *domain;
 	device_t dev;
 	int dev_domain;
 	int dev_busno;
 	ACPI_DMAR_PCI_PATH *dev_path;
 	int dev_path_len;
 	struct dmar_map_entries_tailq *rmrr_entries;
 };
 
 static int
 dmar_rmrr_iter(ACPI_DMAR_HEADER *dmarh, void *arg)
 {
 	struct rmrr_iter_args *ria;
 	ACPI_DMAR_RESERVED_MEMORY *resmem;
 	ACPI_DMAR_DEVICE_SCOPE *devscope;
 	struct dmar_map_entry *entry;
 	char *ptr, *ptrend;
 	int match;
 
 	if (dmarh->Type != ACPI_DMAR_TYPE_RESERVED_MEMORY)
 		return (1);
 
 	ria = arg;
 	resmem = (ACPI_DMAR_RESERVED_MEMORY *)dmarh;
 	if (dmar_match_verbose) {
 		printf("RMRR [%jx,%jx] segment %d\n",
 		    (uintmax_t)resmem->BaseAddress,
 		    (uintmax_t)resmem->EndAddress,
 		    resmem->Segment);
 	}
 	if (resmem->Segment != ria->dev_domain)
 		return (1);
 
 	ptr = (char *)resmem + sizeof(*resmem);
 	ptrend = (char *)resmem + resmem->Header.Length;
 	for (;;) {
 		if (ptr >= ptrend)
 			break;
 		devscope = (ACPI_DMAR_DEVICE_SCOPE *)ptr;
 		ptr += devscope->Length;
 		match = dmar_match_devscope(devscope, ria->dev, ria->dev_busno,
 		    ria->dev_path, ria->dev_path_len);
 		if (match == 1) {
 			if (dmar_match_verbose)
 				printf("matched\n");
 			entry = dmar_gas_alloc_entry(ria->domain,
 			    DMAR_PGF_WAITOK);
 			entry->start = resmem->BaseAddress;
 			/* The RMRR entry end address is inclusive. */
 			entry->end = resmem->EndAddress;
 			TAILQ_INSERT_TAIL(ria->rmrr_entries, entry,
 			    unroll_link);
 		} else if (dmar_match_verbose) {
 			printf("not matched, err %d\n", match);
 		}
 	}
 
 	return (1);
 }
 
 void
 dmar_dev_parse_rmrr(struct dmar_domain *domain, device_t dev,
     struct dmar_map_entries_tailq *rmrr_entries)
 {
 	struct rmrr_iter_args ria;
 
 	ria.dev_domain = pci_get_domain(dev);
 	ria.dev_path_len = dmar_dev_depth(dev);
 	ACPI_DMAR_PCI_PATH dev_path[ria.dev_path_len];
 	dmar_dev_path(dev, &ria.dev_busno, dev_path, ria.dev_path_len);
 
 	if (dmar_match_verbose) {
 		device_printf(dev, "parsing RMRR entries for ");
 		dmar_print_path(dev, "PCI", ria.dev_busno, ria.dev_path_len,
 		    dev_path);
 	}
 
 	ria.domain = domain;
 	ria.dev = dev;
 	ria.dev_path = dev_path;
 	ria.rmrr_entries = rmrr_entries;
 	dmar_iterate_tbl(dmar_rmrr_iter, &ria);
 }
 
 struct inst_rmrr_iter_args {
 	struct dmar_unit *dmar;
 };
 
 static device_t
 dmar_path_dev(int segment, int path_len, int busno,
     const ACPI_DMAR_PCI_PATH *path)
 {
 	devclass_t pci_class;
 	device_t bus, pcib, dev;
 	int i;
 
 	pci_class = devclass_find("pci");
 	dev = NULL;
 	for (i = 0; i < path_len; i++, path++) {
 		dev = pci_find_dbsf(segment, busno, path->Device,
 		    path->Function);
 		if (dev == NULL)
 			break;
 		if (i != path_len - 1) {
 			bus = device_get_parent(dev);
 			pcib = device_get_parent(bus);
 			if (device_get_devclass(device_get_parent(pcib)) !=
 			    pci_class)
 				return (NULL);
 		}
 		busno = pcib_get_bus(dev);
 	}
 	return (dev);
 }
 
 static int
 dmar_inst_rmrr_iter(ACPI_DMAR_HEADER *dmarh, void *arg)
 {
 	const ACPI_DMAR_RESERVED_MEMORY *resmem;
 	const ACPI_DMAR_DEVICE_SCOPE *devscope;
 	struct inst_rmrr_iter_args *iria;
 	const char *ptr, *ptrend;
 	struct dmar_unit *dev_dmar;
 	device_t dev;
 
 	if (dmarh->Type != ACPI_DMAR_TYPE_RESERVED_MEMORY)
 		return (1);
 
 	iria = arg;
 	resmem = (ACPI_DMAR_RESERVED_MEMORY *)dmarh;
 	if (resmem->Segment != iria->dmar->segment)
 		return (1);
 	if (dmar_match_verbose) {
 		printf("dmar%d: RMRR [%jx,%jx]\n", iria->dmar->unit,
 		    (uintmax_t)resmem->BaseAddress,
 		    (uintmax_t)resmem->EndAddress);
 	}
 
 	ptr = (const char *)resmem + sizeof(*resmem);
 	ptrend = (const char *)resmem + resmem->Header.Length;
 	for (;;) {
 		if (ptr >= ptrend)
 			break;
 		devscope = (const ACPI_DMAR_DEVICE_SCOPE *)ptr;
 		ptr += devscope->Length;
 		/* XXXKIB bridge */
 		if (devscope->EntryType != ACPI_DMAR_SCOPE_TYPE_ENDPOINT)
 			continue;
 		if (dmar_match_verbose) {
 			dmar_print_path(iria->dmar->dev, "RMRR scope",
 			    devscope->Bus, (devscope->Length -
 			    sizeof(ACPI_DMAR_DEVICE_SCOPE)) / 2,
 			    (const ACPI_DMAR_PCI_PATH *)(devscope + 1));
 		}
 		dev = dmar_path_dev(resmem->Segment, (devscope->Length -
 		    sizeof(ACPI_DMAR_DEVICE_SCOPE)) / 2, devscope->Bus,
 		    (const ACPI_DMAR_PCI_PATH *)(devscope + 1));
 		if (dev == NULL) {
 			if (dmar_match_verbose)
 				printf("null dev\n");
 			continue;
 		}
 		dev_dmar = dmar_find(dev);
 		if (dev_dmar != iria->dmar) {
 			if (dmar_match_verbose) {
 				printf("dmar%d matched, skipping\n",
 				    dev_dmar->unit);
 			}
 			continue;
 		}
 		if (dmar_match_verbose)
 			printf("matched, instantiating RMRR context\n");
 		dmar_instantiate_ctx(iria->dmar, dev, true);
 	}
 
 	return (1);
 
 }
 
 /*
  * Pre-create all contexts for the DMAR which have RMRR entries.
  */
 int
 dmar_instantiate_rmrr_ctxs(struct dmar_unit *dmar)
 {
 	struct inst_rmrr_iter_args iria;
 	int error;
 
 	if (!dmar_barrier_enter(dmar, DMAR_BARRIER_RMRR))
 		return (0);
 
 	error = 0;
 	iria.dmar = dmar;
 	if (dmar_match_verbose)
 		printf("dmar%d: instantiating RMRR contexts\n", dmar->unit);
 	dmar_iterate_tbl(dmar_inst_rmrr_iter, &iria);
 	DMAR_LOCK(dmar);
 	if (!LIST_EMPTY(&dmar->domains)) {
 		KASSERT((dmar->hw_gcmd & DMAR_GCMD_TE) == 0,
 	    ("dmar%d: RMRR not handled but translation is already enabled",
 		    dmar->unit));
 		error = dmar_enable_translation(dmar);
 	}
 	dmar_barrier_exit(dmar, DMAR_BARRIER_RMRR);
 	return (error);
 }
 
 #ifdef DDB
 #include <ddb/ddb.h>
 #include <ddb/db_lex.h>
 
 static void
 dmar_print_domain_entry(const struct dmar_map_entry *entry)
 {
 	struct dmar_map_entry *l, *r;
 
 	db_printf(
 	    "    start %jx end %jx free_after %jx free_down %jx flags %x ",
 	    entry->start, entry->end, entry->free_after, entry->free_down,
 	    entry->flags);
 	db_printf("left ");
 	l = RB_LEFT(entry, rb_entry);
 	if (l == NULL)
 		db_printf("NULL ");
 	else
 		db_printf("%jx ", l->start);
 	db_printf("right ");
 	r = RB_RIGHT(entry, rb_entry);
 	if (r == NULL)
 		db_printf("NULL");
 	else
 		db_printf("%jx", r->start);
 	db_printf("\n");
 }
 
 static void
 dmar_print_ctx(struct dmar_ctx *ctx)
 {
 
 	db_printf(
 	    "    @%p pci%d:%d:%d refs %d flags %x loads %lu unloads %lu\n",
 	    ctx, pci_get_bus(ctx->ctx_tag.owner),
 	    pci_get_slot(ctx->ctx_tag.owner),
 	    pci_get_function(ctx->ctx_tag.owner), ctx->refs, ctx->flags,
 	    ctx->loads, ctx->unloads);
 }
 
 static void
 dmar_print_domain(struct dmar_domain *domain, bool show_mappings)
 {
 	struct dmar_map_entry *entry;
 	struct dmar_ctx *ctx;
 
 	db_printf(
 	    "  @%p dom %d mgaw %d agaw %d pglvl %d end %jx refs %d\n"
 	    "   ctx_cnt %d flags %x pgobj %p map_ents %u\n",
 	    domain, domain->domain, domain->mgaw, domain->agaw, domain->pglvl,
 	    (uintmax_t)domain->end, domain->refs, domain->ctx_cnt,
 	    domain->flags, domain->pgtbl_obj, domain->entries_cnt);
 	if (!LIST_EMPTY(&domain->contexts)) {
 		db_printf("  Contexts:\n");
 		LIST_FOREACH(ctx, &domain->contexts, link)
 			dmar_print_ctx(ctx);
 	}
 	if (!show_mappings)
 		return;
 	db_printf("    mapped:\n");
 	RB_FOREACH(entry, dmar_gas_entries_tree, &domain->rb_root) {
 		dmar_print_domain_entry(entry);
 		if (db_pager_quit)
 			break;
 	}
 	if (db_pager_quit)
 		return;
 	db_printf("    unloading:\n");
 	TAILQ_FOREACH(entry, &domain->unload_entries, dmamap_link) {
 		dmar_print_domain_entry(entry);
 		if (db_pager_quit)
 			break;
 	}
 }
 
 DB_FUNC(dmar_domain, db_dmar_print_domain, db_show_table, CS_OWN, NULL)
 {
 	struct dmar_unit *unit;
 	struct dmar_domain *domain;
 	struct dmar_ctx *ctx;
 	bool show_mappings, valid;
 	int pci_domain, bus, device, function, i, t;
 	db_expr_t radix;
 
 	valid = false;
 	radix = db_radix;
 	db_radix = 10;
 	t = db_read_token();
 	if (t == tSLASH) {
 		t = db_read_token();
 		if (t != tIDENT) {
 			db_printf("Bad modifier\n");
 			db_radix = radix;
 			db_skip_to_eol();
 			return;
 		}
 		show_mappings = strchr(db_tok_string, 'm') != NULL;
 		t = db_read_token();
 	} else {
 		show_mappings = false;
 	}
 	if (t == tNUMBER) {
 		pci_domain = db_tok_number;
 		t = db_read_token();
 		if (t == tNUMBER) {
 			bus = db_tok_number;
 			t = db_read_token();
 			if (t == tNUMBER) {
 				device = db_tok_number;
 				t = db_read_token();
 				if (t == tNUMBER) {
 					function = db_tok_number;
 					valid = true;
 				}
 			}
 		}
 	}
 			db_radix = radix;
 	db_skip_to_eol();
 	if (!valid) {
 		db_printf("usage: show dmar_domain [/m] "
 		    "<domain> <bus> <device> <func>\n");
 		return;
 	}
 	for (i = 0; i < dmar_devcnt; i++) {
 		unit = device_get_softc(dmar_devs[i]);
 		LIST_FOREACH(domain, &unit->domains, link) {
 			LIST_FOREACH(ctx, &domain->contexts, link) {
 				if (pci_domain == unit->segment && 
 				    bus == pci_get_bus(ctx->ctx_tag.owner) &&
 				    device ==
 				    pci_get_slot(ctx->ctx_tag.owner) &&
 				    function ==
 				    pci_get_function(ctx->ctx_tag.owner)) {
 					dmar_print_domain(domain,
 					    show_mappings);
 					goto out;
 				}
 			}
 		}
 	}
 out:;
 }
 
 static void
 dmar_print_one(int idx, bool show_domains, bool show_mappings)
 {
 	struct dmar_unit *unit;
 	struct dmar_domain *domain;
 	int i, frir;
 
 	unit = device_get_softc(dmar_devs[idx]);
 	db_printf("dmar%d at %p, root at 0x%jx, ver 0x%x\n", unit->unit, unit,
 	    dmar_read8(unit, DMAR_RTADDR_REG), dmar_read4(unit, DMAR_VER_REG));
 	db_printf("cap 0x%jx ecap 0x%jx gsts 0x%x fsts 0x%x fectl 0x%x\n",
 	    (uintmax_t)dmar_read8(unit, DMAR_CAP_REG),
 	    (uintmax_t)dmar_read8(unit, DMAR_ECAP_REG),
 	    dmar_read4(unit, DMAR_GSTS_REG),
 	    dmar_read4(unit, DMAR_FSTS_REG),
 	    dmar_read4(unit, DMAR_FECTL_REG));
 	if (unit->ir_enabled) {
 		db_printf("ir is enabled; IRT @%p phys 0x%jx maxcnt %d\n",
 		    unit->irt, (uintmax_t)unit->irt_phys, unit->irte_cnt);
 	}
 	db_printf("fed 0x%x fea 0x%x feua 0x%x\n",
 	    dmar_read4(unit, DMAR_FEDATA_REG),
 	    dmar_read4(unit, DMAR_FEADDR_REG),
 	    dmar_read4(unit, DMAR_FEUADDR_REG));
 	db_printf("primary fault log:\n");
 	for (i = 0; i < DMAR_CAP_NFR(unit->hw_cap); i++) {
 		frir = (DMAR_CAP_FRO(unit->hw_cap) + i) * 16;
 		db_printf("  %d at 0x%x: %jx %jx\n", i, frir,
 		    (uintmax_t)dmar_read8(unit, frir),
 		    (uintmax_t)dmar_read8(unit, frir + 8));
 	}
 	if (DMAR_HAS_QI(unit)) {
 		db_printf("ied 0x%x iea 0x%x ieua 0x%x\n",
 		    dmar_read4(unit, DMAR_IEDATA_REG),
 		    dmar_read4(unit, DMAR_IEADDR_REG),
 		    dmar_read4(unit, DMAR_IEUADDR_REG));
 		if (unit->qi_enabled) {
 			db_printf("qi is enabled: queue @0x%jx (IQA 0x%jx) "
 			    "size 0x%jx\n"
 		    "  head 0x%x tail 0x%x avail 0x%x status 0x%x ctrl 0x%x\n"
 		    "  hw compl 0x%x@%p/phys@%jx next seq 0x%x gen 0x%x\n",
 			    (uintmax_t)unit->inv_queue,
 			    (uintmax_t)dmar_read8(unit, DMAR_IQA_REG),
 			    (uintmax_t)unit->inv_queue_size,
 			    dmar_read4(unit, DMAR_IQH_REG),
 			    dmar_read4(unit, DMAR_IQT_REG),
 			    unit->inv_queue_avail,
 			    dmar_read4(unit, DMAR_ICS_REG),
 			    dmar_read4(unit, DMAR_IECTL_REG),
 			    unit->inv_waitd_seq_hw,
 			    &unit->inv_waitd_seq_hw,
 			    (uintmax_t)unit->inv_waitd_seq_hw_phys,
 			    unit->inv_waitd_seq,
 			    unit->inv_waitd_gen);
 		} else {
 			db_printf("qi is disabled\n");
 		}
 	}
 	if (show_domains) {
 		db_printf("domains:\n");
 		LIST_FOREACH(domain, &unit->domains, link) {
 			dmar_print_domain(domain, show_mappings);
 			if (db_pager_quit)
 				break;
 		}
 	}
 }
 
 DB_SHOW_COMMAND(dmar, db_dmar_print)
 {
 	bool show_domains, show_mappings;
 
 	show_domains = strchr(modif, 'd') != NULL;
 	show_mappings = strchr(modif, 'm') != NULL;
 	if (!have_addr) {
 		db_printf("usage: show dmar [/d] [/m] index\n");
 		return;
 	}
 	dmar_print_one((int)addr, show_domains, show_mappings);
 }
 
 DB_SHOW_ALL_COMMAND(dmars, db_show_all_dmars)
 {
 	int i;
 	bool show_domains, show_mappings;
 
 	show_domains = strchr(modif, 'd') != NULL;
 	show_mappings = strchr(modif, 'm') != NULL;
 
 	for (i = 0; i < dmar_devcnt; i++) {
 		dmar_print_one(i, show_domains, show_mappings);
 		if (db_pager_quit)
 			break;
 	}
 }
 #endif
Index: user/alc/PQ_LAUNDRY
===================================================================
--- user/alc/PQ_LAUNDRY	(revision 303774)
+++ user/alc/PQ_LAUNDRY	(revision 303775)

Property changes on: user/alc/PQ_LAUNDRY
___________________________________________________________________
Modified: svn:mergeinfo
## -0,0 +0,1 ##
   Merged /head:r303748-303774